The command line tool
rsync commonly found in many Linux distributions is a powerful tool for synchronizing files in directories. It is useful for keeping mirrors of directories up-to-date with minimal effort and data-transfer. This includes copies on remote machines and web servers.
Keeping files synchronized is by nature an ongoing task. It is defined once by a fixed set of parameters. And then it is repeated again and again. This the kind of task GUI are rather bad at. GUIs are nice for exploring file hierarchies and doing simple operations. CLI utilities are much better for well-defined and/or repeatable tasks.
rsync is so nice.
rsync's general invocation syntax is:
rsync [options...] source [source...] destination
Keep in mind that
rsync is a sharp tool. It might not ask for confirmation when overwriting or deleting files or whole directories. It offers a multitude of options which might change the behaviour in unexpected ways. Which is why the
-v are the most important ones.
-n is the dry-run-option: no change to the file system will happen. You can use it to see what
rsync will do when combined with verbose output (
-v) before doing the actual work by repeating the command without
Another very useful option is
-a. It is a short hand for many other more fine-grained options. Its mnemonic is "archive" mode. It is about the closest thing to a carbon copy for files. It recurses into directories and preserves permissions, timestamps, owners etc.
You should also have a look into the man page which is very informative and explains the options in depth.
rsync contains many knobs to turn, making it adaptable to a multitude of use cases.
Directory contents vs directory as a whole
One important thing to note when using
rsync is the importance of the trailing slash of directories. When specifying paths to directories there are two possibilities: to use a trailing slash (
some/path/) or to omit it (
some/path). Most other file utilities ignore the difference; sometimes it controls the behaviour in corner cases (paths not existing, symlinks etc.). But with
rsync there is a consistent difference.
some/path the directory is used as a whole including its root node, whereas
some/path/ uses the contents of the
path directory as the source or destination. When doing for example:
rsync -a some/source/example some/destination/
rsync will copy/synchronize the directory
example to a (new) directory
some/destination/example. When doing:
rsync -a some/source/example/ some/destination/
The tool will synchronize the directory
destination. The contents of
some/source/example/file) are copied into
The trailing slash of the destination can also influence the behaviour. If the source is a single file, then the trailing slash of the destination determines if it is to be treated as file or directory. With the slash, the file will be copied into that directory. Without the slash the file will be copied and renamed to the exact file name given as destination.
# some/source/file.txt -> some/destination/file.txt rsync -a some/source/file.txt some/destination/ # some/source/file.txt -> some/destination rsync -a some/source/file.txt some/destination
Up to this point
rsync is just a powerful copy-tool. But is not named
rcopy after all.
rsync can be used effectively to synchronize directories, that is to keep mirrors of data up-to-date. The mirror directory can be created and updated with a single command which can just be repeated every time it is needed (by hand or by cron).
-u option (think update) only overwrites files if the source file is newer than the destination. Particularly if the destination has the same time stamp then the file will not be copied. This is very nice because the
-a option includes
-t: time stamps are preserved when copying. So you can make one copy of a directory. Then add something to the source directory. When doing the second copy with
-u only the new files (or old files that have been modified) will be copied.
-c option works analogously to the
-u option. But it uses checksums instead of the modification time stamp. Files in the destination are only overwritten if they are actually different (modulo some infinitesimal crypto-collision chance). Metadata including time stamps is ignored and, when the file contents are not overwritten, not changed either.
This option is useful if you have a build system unconditionally regenerating and overwriting some artifacts which you want to mirror somewhere. If the synchronisation is done with
-c (instead of
-u which would have no effect because the build system touched everything), tools which monitor file system effects or modification times in the destination will only pick up the actual changes.
rsync can also clean up directories. Normally files not "mentioned" in the source are ignored in the destination. But this implies that files which are deleted in the source at some point will stay in the destination indefinitely until removed manually. For many use cases the destination is then not considered a mirror of the source anymore.
rsync features the
--delete family of options. These, well, delete files (and directories) in the destination directories for which no corresponding source exists.
One of the best things is that
rsync can also work over SSH. Either source or destination can be a remote SSH connection (but not both). Just use the syntax
user@domain:/some/path for the source or the destination. After being asked for the password all of the features of
rsync work with the remote connection. So this is very nice for working on something locally and then deploying it onto some remote machine. And only the files that need to be changed are touched.
Sadly, next to its own special protocol,
rsync only supports SSH.
Using FTP for example is not easily possible. But in typical Unix manner this can be outsourced to another tool. Other remote file system can be mounted into the local virtual file system. There exist different tools for these different connection types. Most of
rsync's features should perform well.