unison
unison copied to clipboard
Resume partial transfers for local syncs
Closes #570
I've opened the PR as draft not because the code isn't finished but because I think more discussion about the (potential) cons of this change may be needed.
While the resume logic exists for remote transfers, it has not been used for local transfers so far. Maybe there is (or was historically) a good reason for doing so. Resuming partials does require digesting the entire contents of partial transfers and perhaps this was seen as unnecessarily expensive locally where a re-copy would be just as fast or even faster. However, I don't believe that is an accurate reason. There are many local sync scenarios where resuming could make sense: different disks, network shares, local "cloud" mounts as mentioned in #570, to name a few.
Usually a local transfer would happen like this (please correct me if I got it wrong):
- read the entire source file to fingerprint it,
- copy the file (once more read the entire file and write it),
- read the entire target file to fingerprint it and compare fingerprints (so-called paranoid check).
With this patch a local transfer will happen like this:
- read the entire source file to fingerprint it,
- read the entire partial target file to fingerprint it, compare fingerprints and discover they are the same - no copy needed,
- read the entire target file to fingerprint it and compare fingerprints (so-called paranoid check).
In sum, the total number of reads remains the same and the write is skipped completely. Already a win! But, one read is now done in the target replica instead of the source replica. This could be a win or a lose. A read instead of a write (assuming same file size) in most cases (thinking spinning disks, over network) would probably be faster or pretty much the same. All in all, seems like a net benefit or neutral.
Are there some obvious drawbacks that I'm missing; why was this functionality not enabled for local syncs?
I think the idea that "local" implies two fast disks (at least spinning fast) is incorrect. Many things can mount in via NFS, FUSE or some other mechanism. So I don't see justifcation for different processing. A good question to ask, but I tend towards just believing that this used to be a good idea, and after asking and not hearing, not being overly deferential to the past.
It's not clear to me that this PR makes the local sync case behave the same as the remote case, and if there is any remaining behavior difference, what the rationale is for that.
I'm afraid I don't remember the reasons why local sync was treated differently -- I incline toward Greg's theory that it was mostly because the local case was assumed to be fast, so not needed the extra mechanism. In any case, moving to uniform treatment seems like a good idea (with the caveat that this may involve touching some fairly convoluted code, so plenty of testing is in order).
It's not clear to me that this PR makes the local sync case behave the same as the remote case, and if there is any remaining behavior difference, what the rationale is for that.
No, it's still not exactly the same as the remote case. Remote case goes through more steps, very roughly so (not intended to be accurate):
- resume check (same as for local with this patch)
- replica-local copy shortcut (one would think it makes no sense with local sync, but it's not so straightforward; this was discussed in #472)
- use external
copyprog
if requested - apparently there's one more resume check, which is not included in this PR. It checks if a file was partially transferred and appends the remaining contents. This one I'll definitely look into adding for the local case as well.
- contents are transferred with rsync algorithm
- finally the paranoid check (same as with local sync)
It is not completely clear to me if it is trivial to make the local sync behave the same as the remote case and whether it is something to desire (the rsync thing does sound nice in theory).
This may be unreasonable, but I wonder about making all syncs just use the remote path, just skipping ssh/socket and having a 2nd unison locally. But making local more like remote seems good.
I went ahead and dug a little more in the code. With some hacking in the Remote module I got the remote copy code working with local sync and it seems to Just Work (is it all optimal though).
The reason I had to hack in the Remote module is that the interface Copy module is using requires a connection. The connection is actually not needed, as all other remote function calls have a proxy that just calls the function in the same process locally. Adding this proxy worked (seemingly) perfectly (and why shouldn't it have -- all other modules are using this proxied Remote interface which doesn't require a connection).
What I think I'll do next, I add the partial file resume (append) functionality for local syncs in this PR as that seems useful. But... This functionality, too, is currently dependent on the Remote module interface without the proxy code, so maybe not worth hacking at the moment.
As a next, bigger step, I will look into properly changing the remote function call proxies in the Copy module (as mentioned, I actually did this already; the emphasis here is on properly). This will allow the entire remote code path to be used locally. It should require very little code change but a whole lot of testing. And then some more testing. Are we ready for that?
My overall reaction is that fixing this right, even if a bigger step, is preferred, and that it's less total work to test a major rework once than to test a bunch of small steps. (And, it's just local sync, and people can always function via ssh to self if things get really troubled. That's assuming something sneaks by after we have the usual 30 test volunteers :-( )
That bigger change is now #572. I will keep this PR still open for a while but if it seems that the other one is The Right Thing to do then I will close this one.