scp is slow at copying a lot of little files. It’s probably because, it creates a ssh connection and then runs in a loop that open the file locally , reads it, sends it, creates the file remotely and restarts that loop. I think that the cost of opening files and creating files is high and tar is a lot more efficient at it (if you know exactly why, please post a comment). That’s why I often use ssh with a combination of tar to copy a directory with a lot of little files (let’s say a website with a tone of 10kB php files).
To achieve this, where I typically use:
scp -r a-directory/ user@host:
I would use:
tar cfz - a-directory | ssh user@host "tar zxvf -"
First, I create a tar file that output to stdout (
tar cf -, the “-” is to output to stdout) then I pipe the output to an ssh connection that execute a command that read stdin and pass it to tar (
tar xvf -, where “-” is stdin).
Surely, you can remove the “z” in the input and output tar , so the content won’t be compressed with gzip, or replace it with “j” to use bzip (with gnu tar).
that would be:
tar cf - a-directory | ssh user@host "tar xvf -" or
tar cfj - a-directory | ssh user@host "tar jxvf -"
Also, tar knows how to use ssh:
tar cvfz user@host:file.tar a-directory but this method have the drawback of creating a tar file remotely, not copying the directory, you still need to uncompress the tar on the other side.
Now some numbers, this is not a benchmark, this is just an indication, I had the idea of writing this article while copying a directory I needed to copy. Numbers may vary with size of files, number of files, local cpu, remote cpu, network bandwidth and probably other parameters:
# du -sh a-directory
# find a-directory -type f |wc -l
time with gzip:
time with bzip:
time without compression:
tar with user@remote:file.tar
Conclusion: In that case (and only in that case) with ssh and tar, we have a speedup of around 19x (with the gzip method).
Here, gzip is the winner, because my servers have a lot of bandwidth (the copy is over the internet but between 2 hosted servers). Bzip2 seems to consume too much cpu and is lagging, gzip seems to compress pretty much efficiently with no delay, the no compression method increase network transfer times.
That’s it, have fun !