rsync icon indicating copy to clipboard operation
rsync copied to clipboard

notes / behaviors / inconsistencies when using --remove-source-files

Open malventano opened this issue 4 years ago • 2 comments

I was previously on 3.1.3 and would transfer (move) sets of large files to another location with --remove-source-files . I recently built 3.2.3-65 in order to address issue 71 (not terminating on destination full), but then I noted a difference in the --remove-source-files behavior.

On both versions, the --remove-source-files option appears to only do so every so often (every ~10 files). On 3.1.3, canceling the transfer would finish removing the recent set of already transferred files. On 3.2.3, canceling the transfer leaves those recent files that were already transferred, requiring the user to reissue the command and cancel it again in order to remove those files.

Recommendation: --remove-source-files should remove the source files as they are transferred and without any delays.

This will help in cases where the user is attempting to clear space from a volume as it is being filled from another source. Otherwise, the user would have to occasionally manually cancel and restart the transfer, as already transferred files remain on that volume for an extended period of time, especially in the case of a slow or intentionally bandwidth-limited transfer. Further, with the current behavior on 3.2.3, the user is left with files in both locations that they would have assumed were moved given the selected options. Canceling the operation should not result in duplicates remaining, as was the case with 3.1.3.

malventano avatar Jun 26 '21 19:06 malventano

Rsync executes its actions in a pipeline of data, so the messages about what files are done are often delayed behind copious amounts of checksum data that is also making its way through the socket. There's not much that that can be done about that without introducing some kind of command socket separate from the data socket, which is not something that is likely to happen anytime soon (and making it not pre-generate data for upcoming files would slow down the file transfer rate).

If you interrupt any transfer, it's not possible to know how much of the interrupted data makes it through before the socket closed down. I imagine that the change you're seeing is that newer rsyncs try to interrupt the socket sooner than the older ones did, as this helps to avoid having rsync try to finish off a huge multi-GB file transfer when you told it you want to abort. The downside is that the sooner the socket closes down, the fewer dangling the-file-is-done messages can make it through.

Perhaps the --remove-source-files option should make rsync use the slower, more complete shutdown method, so I will consider that.

WayneD avatar Jun 27 '21 19:06 WayneD

This is a --bwlimit=100m transfer of several ~100GB files to another local drive. I would not expect checksum data to cause over an hour delay in the removal of the first source file. The delay is so large that the removed files are running >1TB (>10 files) behind the current state of the transfer. The transfer is occurring across an unsaturated path on an otherwise idle system. In this config, the source is capable of >1GB/s (SSD array) and the destination >120 MB/s (single 12TB HDD). Any other data should have no issue transferring through any pipes or sockets in this case, which had led me to believe that there was some file-count-based delay in deletions at play.

An additional observation on the current behavior: When the destination drive is full, the last few files are removed from the source the instant the 'No space left on device (28)' error is thrown, but then after throwing the error, rsync immediately starts attempting to transfer another file (but no change in destination free space when watching df), so I'm not quite sure what it is doing at that point, as I would have expected it to abort and return to prompt after throwing the errors. The same thing happens if a drive is failing and switches into read-only mode during the copy - rsync throws the error and then starts on the next file (not actually transferring anywhere) without aborting.

malventano avatar Jun 27 '21 21:06 malventano