unison icon indicating copy to clipboard operation
unison copied to clipboard

Add separate scan, scan, compare, act steps -- to workaround scaling problems

Open rrnewton opened this issue 8 years ago • 4 comments

When dealing with large archives on external drives with many files, I run into scaling problems where something will go wrong before a unison completes. Does anyone else have this problem?

Has there ever been any consideration of breaking down the unison phases to be runnable with separate process invocations, rather than assuming the unison process on machine A and machine B are live for the entire length of the interaction?

Specifically I would like to break out these phases:

  1. scan directory A on first host (possible in parts!);
  2. scan directory B on second host;
  3. go online and compare the scans;
  4. (optional) user audits the resulting action plan; and finally
  5. execute the action plan (possibly with disconnect/retry, but without redoing previous steps)

Of course, I would need to take responsibility that the file system doesn't change between these steps, but that's nothing new. Unison already needs to contend with racing modifications, whether the above steps happen in one process launch or on multiple discrete steps.

I realize this is not likely to happen, but I'm curious if others would hypothetically have a use for this, or if others could recommend a sync tool that would better handle these scaling problems. I'm finding that even new-style blob-store tools like restic tend to operate with an indivisible "one big scan" of the file system, which is the scaling bottleneck I'm trying to ameliorate. In fact, unison is already better than most in being able to take a -path argument and manually break down a big sync job into multiple parts.

rrnewton avatar Nov 28 '17 15:11 rrnewton

I like this idea. Regarding

Of course, I would need to take responsibility that the file system doesn't change between these steps, but that's nothing new. Unison already needs to contend with racing modifications, whether the above steps happen in one process launch or on multiple discrete steps.

unison already protects the user against this, as long as we make sure the archives stay locked.

Now we just need to find someone to design and implement this.

brabalan avatar Nov 28 '17 19:11 brabalan

I'm actually wondering how hard this is. I think the answer depends on which part is actually taking most of the time. I can imagine two possibilities:

  1. The replicas contain large amounts of new data, so fingerprinting takes most of the time and the fingerprint cache doesn't help.
  2. The replicas contain only a little new data, but there are so many files that simply stat-ing them takes most of the time.

The second case is harder, but if it's the first case then wouldn't it be enough just to checkpoint the fingerprint cache every so often ()? If Unison then gets interrupted, the next run would still have to re-scan some of the filesystem, but maybe this is fast enough. (E.g., on my laptop, rescanning a terabyte or so takes only a few seconds if the fingerprint cache is hot.)

bcpierce00 avatar Nov 28 '17 19:11 bcpierce00

In my case, I'm always using -fastcheck (Mac/Linux) and often I'm just trying to do a new scan of two disks I know are identical to set up unison for future, incremental invocations.

One kind of problem that I have is where the ssh connection seems to have died either before one host finishes scanning, or before I get back to the computer to audit the results in the GUI. This could maybe just be fixed by some kind of UI option for "reconnect/retry".

rrnewton avatar Nov 28 '17 21:11 rrnewton

I was afraid you were talking about the hard case. :-)

My own workaround for this is: unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null unison PROFILENAME < /dev/null

bcpierce00 avatar Nov 28 '17 21:11 bcpierce00

This issue is really a discussion, and there's no recent data about problems. There is also not a good argument that exiting the process and restarting will help, assuming no leaks, and if there are leaks, we should fix them anyway. And, it's really clear that nobody is going to pick up this ticket from the queue and implement it :-)

Feel free to discuss the big picture on unison-hackers.

gdt avatar Mar 22 '23 15:03 gdt