rsync
rsync copied to clipboard
Performance Improvement: --checksum / Check for absence of target first
Current State When using rsync with the --checksum flag to copy a bunch of big files to an empty directory, rsync apparently calculates the checksum of the big files before noticing that the target files do not exist.
Performance Improvement Instead of blindly spending resources on calculating the checksums, first check whether the target exists at all (with absence allowing to skip the checksum calculation).
Command Used
rsync --recursive --noatime --preallocate --partial --verbose --progress --checksum fooA fooB
This requires a pretty big protocol change, but is already on my list of desired future improvements if I ever get around to an rsync 4.x or similar big update.
If this get's added, please add an option to keep the current behavior. I'm using rsync in a custom archive project where I need the hash of a file as it's transferred to the archive location . Using rsync w/ --checksum and --itemize-changes to send the file and compute the hash at the same time is ideal for this purpose. In this use case, there are never any matching destination files but I'm able to extract the hash from the itemized output and record that for future error checking of the file on the archive machine.