Posts Tagged copying

scp with little files, use ssh and tar

scp is slow at copying a lot of little files. It’s probably because, it creates a ssh connection and then runs in a loop that open the file locally , reads it, sends it, creates the file remotely and restarts that loop. I think that the cost of opening files and creating files is high and tar is a lot more efficient at it (if you know exactly why, please post a comment). That’s why I often use ssh with a combination of tar to copy a directory with a lot of little files (let’s say a website with a tone of 10kB php files).

To achieve this, where I typically use:
scp -r a-directory/ user@host:
I would use:
tar cfz - a-directory | ssh user@host "tar zxvf -"

First, I create a tar file that output to stdout (tar cf -, the “-” is to output to stdout) then I pipe the output to an ssh connection that execute a command that read stdin and pass it to tar (tar xvf -, where “-” is stdin).

Surely, you can remove the “z” in the input and output tar , so the content won’t be compressed with gzip, or replace it with “j” to use bzip (with gnu tar).

that would be:

tar cf - a-directory | ssh user@host "tar xvf -" or
tar cfj - a-directory | ssh user@host "tar jxvf -"

Also, tar knows how to use ssh:
tar cvfz user@host:file.tar a-directory but this method have the drawback of creating a tar file remotely, not copying the directory, you still need to uncompress the tar on the other side.

Now some numbers, this is not a benchmark, this is just an indication, I had the idea of writing this article while copying a directory I needed to copy. Numbers may vary with size of files, number of files, local cpu, remote cpu, network bandwidth and probably other parameters:


# du -sh a-directory
109M a-directory
Read the rest of this entry »

Comments (3)