Efficient transfer of huge files between machines

I need to transfer 250gb of data between two machines over the lan.

Ladies and gentlemen, the tar trick. I only need this every two to five years, but when I need it, I really need it.

This tars up the contents of a directory and streams it as compressed data to a remote host. This is super efficient as:

1. The data is compressed before it is transferred, so overall throughput is greatly reduced.

2. Compression and decompression are distributed and run in parallel. Instead of zipping everything up, transferring it, and then unzipping it, both processes run simultaneously, the the compress/decompress time is cut in half.

How to tar something to a remote host.

tar cvf - source_dir | gzip | ( ssh target_host “cd target_dir; gunzip -d | tar xvf - ” )

How to tar something from a remote host.

ssh target_host “tar cvf - source_dir | gzip” | gunzip -d | tar xvf -

UPDATE:

Several people have mentioned more updated refinements, I haven’t tried all of these but I agree that they sound like smart approaches:

Jerry Chen: you can use ssh’s -C compression flag which accomplishes the same thing as gzip, i believe

Nate True:  I like to use Netcat instead of SSH to transfer the data - cuts out the overhead of encryption but takes more effort to set up.

Zachery Bir: can’t you just use cvfz/xvfz and bypass piping to gzip/gunzip altogether?

Notes

  1. softarts posted this