gitoxide
gitoxide copied to clipboard
Ensure robustness when killed in the middle
git is somewhat robust when killed in the middle. libgit2 seems somewhat less so. I'd love to see gitoxide be rock-solid on that front.
When doing things like clone, fetch, and similar, it's important to deal with network failure, ctrl-c, systems being powered off, and other sources of abrupt interruption. It's important that this never leave the repository in an inconsistent state (e.g. references to objects that don't exist, corrupted packs or indexes that git will choke on, and similar). This includes things like putting things in temporary locations and atomic-renaming them in place, or ensuring that objects are put in place before references to them.
I absolutely agree, thanks for making that explicit.
How would you create multiple files atomically? This would be required to perfectly conclude a clone, which currently is done in a plumbing command by creating refs sequentially.
Would you go as far as to maintain an undo list to respond to interrupts properly?
@Byron Catching interrupts won't handle kill -9
, or a power failure, or similar.
You don't have to make entire operations atomic when that isn't possible. It's OK if an interrupted fetch leaves some refs created and others not; another fetch will clear that up. (Clones can be made atomic by doing them into a temporary directory and renaming that directory into place.) What must not happen is a ref getting created but the object it references not existing, or an incomplete pack existing where git looks for packs, or other cases where the repository is in an inconsistent state. There should always be an order you can perform operations in such that the repository remains consistent.
I see, so rather than complicating things try to design operations to never corrupt the git repository, and ideally recover automatically next time the operation is run in case resources have been leaked, like some refs still being present after interruption.
In the same vein, operations should probably be hardened against races, too. For example, cloning into the same location will have one git
process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9
ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process. Thinking about it, since there is a special kind of temp file which is always cleaned up if a process is killed, that could possibly be used as marker to differentiate this case as well.
Generally, when seeing the .git
repository as database, all best-practices should certainly be applied to prevent corruption and allow (auto) recovery.
For example, cloning into the same location will have one
git
process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, whenkill -9
ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process.
You could potentially handle that with file locks, which don't outlive the process holding them.
git-tempfile
and git-lock
will help for sure in keeping the repository consistent in cases that are not kill -9
. It's still to be figured out how to respond to stray locks though, especially on the server side where doing so should probably be automated.
It is potentially interesting to note that git.git
repacks refs when more than one is updated in a transaction (for —atomic
support). Last time I checked, libgit2
doesn’t do that, making ref transactions not actually atomic.
@kim And on top of that one has to write the reflog which isn't contained in the packed-refs file and consist of a file per ref. I will still have to see how exactly that is locked, but I am hopeful it's made in a way to not completely tank performance.
Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).
Guess you can see now why I keep crying “reftable” ;)
Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).
Super interesting, thanks for sharing! I checked it against my current sketch for transactions and am glad this would naturally be supported. I also noted that in bare repos, no reflog is created otherwise which wasn't on my radar yet.
Guess you can see now why I keep crying “reftable” ;)
There is no way around it on the server for sure. Since it operates in blocks it's probably less likely to be overwhelmed by the amount of filesystem events that it produces. But whatever it's going to be there is no escape if there are too many events it's either the files
backend or the reftable
.
if there are too many events
I misspoke there, it's more related to the notify
crate not allowing the set event masks for portability reasons, and also the inherent raciness of watching directories recursively.