unison
unison copied to clipboard
Graceful stop in repeat mode - part 2
Background and motivation in #554. Closes #554 cc @bruceg
This PR adds several methods to fix the original request from #554, namely a "graceful shutdown". Not all of these methods need to be merged. They all have their pros and cons. I'm hoping for a discussion to reach a decision on which methods (possibly all) will be merged. I've kept commits separate for easier review and selection.
"Graceful shutdown" means that a stop can be requested at any time but the process is stopped only after any ongoing sync is completed (it does not need to be successful, just finished for the moment). This is complementary to #810.
In this PR the following methods for a "greaceful shutdown" are included:
- signal SIGUSR2
- reading a "stop condition" from (possibly redirected) stdin:
- this can be an EOF (from the terminal Ctrl-D) (not available if stdin is a regular file or EOF already at startup (such as
/dev/null)), or - byte 0x04 (other input is silently ignored), or
- closing stdin (if it was open at process start)
- this can be an EOF (from the terminal Ctrl-D) (not available if stdin is a regular file or EOF already at startup (such as
For more details please also see the changes to the manual in this PR.
With the stdin method one has basically endless possibilities and flexibility to send a stop request. Here are some examples to get the imagination going:
sleep 60 | unison ... -repeat-- automatically request to stop the repeat loop after 60 seconds; it stops when the current sync is completedecho | unison -batch ...-- guaranteed to stop after one sync, even if there is a repeat preference in the (default) profile; could be used in a script to guarantee termination when not all preference values are knownmkfifo fifoname ; cat fifoname | unison ... -repeat-- request to stop the repeat loop by writing 0x03 or 0x04 into the named pipe (it doesn't even matter what is written because EOF/closing will also work)ncat -l 45678 | unison ... -repeat-- request to stop the repeat loop by opening a TCP connection to port 45678 and writing 0x04 into the socket (it doesn't even matter what is written because EOF/closing will also work) (inbashyou can even doecho quit > /dev/tcp/yourhost/45678)
An overview of different methods and some of their strong and weak sides
| Method | Pros | Cons | Comments |
|---|---|---|---|
| SIGUSR2 |
|
|
|
| stdin (terminal) |
|
|
It would be best to read from stdin in a non-blocking way. This can be done by setting stdin into non-blocking mode (POSIX) or changing terminal settings (POSIX and Windows). But both of these methods are very fragile because the terminal is shared with other applications that will be impacted or will change terminal settings for themselves. On top of that, the process can be run in background or foreground and switched between those modes at any moment. All this becomes very tricky. That's why I haven't included this code in the PR (although it works well, ignoring these concerns). |
| stdin (not a regular file) |
|
|
|
| stdin (regular file) |
|
|
Not sure if regular file redirected to stdin should be supported or not. Reading from a file needs constant looping (I currently set the interval at 1 s) which means that either it can be unreasonably busy, or the opposite, quite slow to react. If not looping then an EOF would be immediately reached. Also, current code only reads appended input (not a major problem, I think).
I've excluded files that are initially larger than 32 bytes (arbitrary limit). |
You may have noticed a common theme in my comments. Windows. A solution based on signals is fine but that leaves Windows completely out. Except for SIGINT which Windows knows about... but which is sent to all processes, so not really helpful.
I guess the $64,000 question is: Are we willing to have unix do things the wrong way because there is no right way on windows. I will think more.
I've removed the SIGQUIT method. It really was redundant and only made the code more complex. Now the diff is smaller and easier to review.
In principle I'm also ok with allowing /dev/null as stdin but I didn't add the code just yet because it's going to make the code more complex again. I'm waiting with this until after the concept has been reviewed and agreed upon.
Status update.
What is currently implemented (not everything is merged yet).
To interrupt/terminate immediately (already merged)
- SIGINT tries to terminate the process as quickly as possible, interrupting the ongoing sync but still doing the cleanup (which requires communication with the remote end)
- SIGINT multiple times will terminate immediately, skipping the cleanup
- SIGTERM works the same way as SIGINT
- This works in all modes, interactive and batch, single run and repeat mode
To stop the repeat mode without interrupting ongoing sync (this PR)
Continue with any ongoing sync, just don't start the next repeat loop after this one.
- SIGUSR2 (not available in Windows)
- stdin
- was initially open and is now closed
- EOF received (in some cases only; regular files and
< /dev/nullwill not trigger this condition, for example) - byte 0x4 received
All the changes are now pushed in this PR.
Now that this has settled I looked through in more detail.
I'm totally ok with merging the USR2 stuff as I see no downsides.
I think the stdin stuff is ok, but I would like to see tighter interface specs to avoid limiting us later, and more clarity about the rules; I find "stdin is open" to be confusing.
There is also some plan presumably (didn't read the code super carefully) to recheck a regular file via seek/read every N seconds. I hope this does not update atime, and in general I'm a little worried about the process doing things and using cpu and fs ops all the time when unison should be more or less quiescent. All of this makes me wonder about a command-line arg (not a preference) to enable the stdin checking behavior.
Now that this has settled I looked through in more detail.
I'm totally ok with merging the USR2 stuff as I see no downsides.
Ok, I may very well split the PR to get the foundations out of the way.
I think the stdin stuff is ok, but I would like to see tighter interface specs to avoid limiting us later, and more clarity about the rules; I find "stdin is open" to be confusing.
Ok, let's work on this. I don't quite remember the origins of "stdin is open" (is this possible on POSIX at all?). It may be a Windows thing where it is possible not to have stdin at all but that only applies to GUI applications, which is clearly not the case here. If we conclude that stdin is always open (although, as you can see with nohup, it does not have to be readable) then we can skip all that language.
There is also some plan presumably (didn't read the code super carefully) to recheck a regular file via seek/read every N seconds. I hope this does not update atime, and in general I'm a little worried about the process doing things and using cpu and fs ops all the time when unison should be more or less quiescent. All of this makes me wonder about a command-line arg (not a preference) to enable the stdin checking behavior.
It's not as bad. select(2) is used, so no extra cpu or fs ops are consumed constantly/regularly. I specifically made sure that the process wouldn't do anything extra. If it does then it's a bug. I don't know about updating the atime but if you have redirected a file to stdin then you have requested yourself for that file to be read, so I'd say that's not a worry.
Edit: select(2) doesn't work on regular files -- of course... -- so there is a loop with 1 second interval to read from stdin if it's a regular file. That loop is active only during the wait between sync cycles; but this wait can be several hours long.
Please do split USR2 and we can merge that right away and maybe get some user testing.
It strikes me that the stdin mechanism is ending up a bit awkward. I see the reason for each step but it is still complicated.
I wonder if there is a way (that works on windows) to instead of the stdin mechanism, do:
unison <profile> -repeat -graceful-id 57 &
[long time later]
unison -graceful-stop 57
and have that under the covers (but the above is the interface spec) open a fifo (or regular file if it has to be on windows) in $UNISONHOME.
It doesn't have the "fg<CR>^D" simplicity, but it is very scriptable and straighforward. It avoids the race condition (assuming the 2nd command waits until the fifo appears, or gives up after a minute).