Using rsync

17.04.18    tools and tricks    network

The command line tool rsync commonly found in many Linux distributions is a powerful tool for synchronizing files in directories. It is useful for keeping mirrors of directories up-to-date with minimal effort and data-transfer. This includes copies on remote machines and web servers.

Keeping files synchronized is by nature an ongoing task. It is defined once by a fixed set of parameters. And then it is repeated again and again. This the kind of task GUI are rather bad at. GUIs are nice for exploring file hierarchies and doing simple operations. CLI utilities are much better for well-defined and/or repeatable tasks. That's why rsync is so nice.

rsync's general invocation syntax is:

rsync [options...] source [source...] destination

General options

Keep in mind that rsync is a sharp tool. It might not ask for confirmation when overwriting or deleting files or whole directories. It offers a multitude of options which might change the behaviour in unexpected ways. Which is why the -n and -v are the most important ones. -n is the dry-run-option: no change to the file system will happen. You can use it to see what rsync will do when combined with verbose output (-v) before doing the actual work by repeating the command without -n.

Another very useful option is -a. It is a short hand for many other more fine-grained options. Its mnemonic is "archive" mode. It is about the closest thing to a carbon copy for files. It recurses into directories and preserves permissions, timestamps, owners etc.

You should also have a look into the man page which is very informative and explains the options in depth. rsync contains many knobs to turn, making it adaptable to a multitude of use cases.

Directory contents vs directory as a whole

One important thing to note when using rsync is the importance of the trailing slash of directories. When specifying paths to directories there are two possibilities: to use a trailing slash (some/path/) or to omit it (some/path). Most other file utilities ignore the difference; sometimes it controls the behaviour in corner cases (paths not existing, symlinks etc.). But with rsync there is a consistent difference.

When using some/path the directory is used as a whole including its root node, whereas some/path/ uses the contents of the path directory as the source or destination. When doing for example:

rsync -a some/source/example some/destination/

Then rsync will copy/synchronize the directory example to a (new) directory some/destination/example. When doing:

rsync -a some/source/example/ some/destination/

The tool will synchronize the directory example onto destination. The contents of example (e.g. some/source/example/file) are copied into destination (some/destination/file).

The trailing slash of the destination can also influence the behaviour. If the source is a single file, then the trailing slash of the destination determines if it is to be treated as file or directory. With the slash, the file will be copied into that directory. Without the slash the file will be copied and renamed to the exact file name given as destination.

# some/source/file.txt -> some/destination/file.txt
rsync -a some/source/file.txt some/destination/

# some/source/file.txt -> some/destination
rsync -a some/source/file.txt some/destination

Synchronisation

Up to this point rsync is just a powerful copy-tool. But is not named rcopy after all. rsync can be used effectively to synchronize directories, that is to keep mirrors of data up-to-date. The mirror directory can be created and updated with a single command which can just be repeated every time it is needed (by hand or by cron).

The -u option (think update) only overwrites files if the source file is newer than the destination. Particularly if the destination has the same time stamp then the file will not be copied. This is very nice because the -a option includes -t: time stamps are preserved when copying. So you can make one copy of a directory. Then add something to the source directory. When doing the second copy with -u only the new files (or old files that have been modified) will be copied.

The -c option works analogously to the -u option. But it uses checksums instead of the modification time stamp. Files in the destination are only overwritten if they are actually different (modulo some infinitesimal crypto-collision chance). Metadata including time stamps is ignored and, when the file contents are not overwritten, not changed either. This option is useful if you have a build system unconditionally regenerating and overwriting some artifacts which you want to mirror somewhere. If the synchronisation is done with -c (instead of -u which would have no effect because the build system touched everything), tools which monitor file system effects or modification times in the destination will only pick up the actual changes.

Moreover rsync can also clean up directories. Normally files not "mentioned" in the source are ignored in the destination. But this implies that files which are deleted in the source at some point will stay in the destination indefinitely until removed manually. For many use cases the destination is then not considered a mirror of the source anymore. Therefore rsync features the --delete family of options. These, well, delete files (and directories) in the destination directories for which no corresponding source exists.

SSH Connection

One of the best things is that rsync can also work over SSH. Either source or destination can be a remote SSH connection (but not both). Just use the syntax user@domain:/some/path for the source or the destination. After being asked for the password all of the features of rsync work with the remote connection. So this is very nice for working on something locally and then deploying it onto some remote machine. And only the files that need to be changed are touched.

Sadly, next to its own special protocol, rsync only supports SSH. Using FTP for example is not easily possible. But in typical Unix manner this can be outsourced to another tool. Other remote file system can be mounted into the local virtual file system. There exist different tools for these different connection types. Most of rsync's features should perform well.