scp with little files, use ssh and tar

scp is slow at copying a lot of little files. It’s probably because, it creates a ssh connection and then runs in a loop that open the file locally , reads it, sends it, creates the file remotely and restarts that loop. I think that the cost of opening files and creating files is high and tar is a lot more efficient at it (if you know exactly why, please post a comment). That’s why I often use ssh with a combination of tar to copy a directory with a lot of little files (let’s say a website with a tone of 10kB php files).

To achieve this, where I typically use:
scp -r a-directory/ user@host:
I would use:
tar cfz - a-directory | ssh user@host "tar zxvf -"

First, I create a tar file that output to stdout (tar cf -, the “-” is to output to stdout) then I pipe the output to an ssh connection that execute a command that read stdin and pass it to tar (tar xvf -, where “-” is stdin).

Surely, you can remove the “z” in the input and output tar , so the content won’t be compressed with gzip, or replace it with “j” to use bzip (with gnu tar).

that would be:

tar cf - a-directory | ssh user@host "tar xvf -" or
tar cfj - a-directory | ssh user@host "tar jxvf -"

Also, tar knows how to use ssh:
tar cvfz user@host:file.tar a-directory but this method have the drawback of creating a tar file remotely, not copying the directory, you still need to uncompress the tar on the other side.

Now some numbers, this is not a benchmark, this is just an indication, I had the idea of writing this article while copying a directory I needed to copy. Numbers may vary with size of files, number of files, local cpu, remote cpu, network bandwidth and probably other parameters:


# du -sh a-directory
109M a-directory

# find a-directory -type f |wc -l
9992

time with gzip:
real 0m16.466s
user 0m4.230s
sys 0m0.300s

time with bzip:
real 0m28.943s
user 0m21.390s
sys 0m0.390s

time without compression:
real 0m24.566s
user 0m1.410s
sys 0m0.390s

tar with user@remote:file.tar
real 0m32.891s
user 0m4.100s
sys 0m0.370s

simple scp:
real 5m1.188s
user 0m1.770s
sys 0m1.320s

Conclusion: In that case (and only in that case) with ssh and tar, we have a speedup of around 19x (with the gzip method).
Here, gzip is the winner, because my servers have a lot of bandwidth (the copy is over the internet but between 2 hosted servers). Bzip2 seems to consume too much cpu and is lagging, gzip seems to compress pretty much efficiently with no delay, the no compression method increase network transfer times.

That’s it, have fun !

About these ads

3 Comments »

  1. george said

    Hi,
    you seem to be an expert in tar.
    You came up in google when I searched for a solution to my problem:
    I would like to subset a tar file without extracting to disk.

    but something like:
    tar -xf big.tar –wildcards –no-anchored ‘*selection*’ | tar –create -f – –file=subset.tar

    does not work.

  2. Bob Dong said

    Hay.

    I want to dump a tar file on the target machine. Too bad this doesn’t work:

    tar -cf – /etc | ssh zim@remotehost ‘tar -cf /home/moo/etc.tar’

    In other words, I do NOT want to extract the tarball on the target box; just transfer the tarball intact. Maybe I need to use backticks. (“)

    I know there is some way to do this. And I know that rsync is the answer but I want to crack this nut.

  3. smaftoul said

    tar -cf – /etc |ssh zim@remotehost “cat – > /home/moo/etc.tar”

    Your a not “tarring” on remote host:
    You produce the tar locally, put the stream on stdout (tar -), pipe it to ssh so the tar stream goes on the remote host, then you capture the stream (whatever strem it is: a file, a tar, a tar.gz) and redirect that stream to a file on the remote host !

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: