gitoxide Ensure robustness when killed in the middle

git is somewhat robust when killed in the middle. libgit2 seems somewhat less so. I'd love to see gitoxide be rock-solid on that front.

When doing things like clone, fetch, and similar, it's important to deal with network failure, ctrl-c, systems being powered off, and other sources of abrupt interruption. It's important that this never leave the repository in an inconsistent state (e.g. references to objects that don't exist, corrupted packs or indexes that git will choke on, and similar). This includes things like putting things in temporary locations and atomic-renaming them in place, or ensuring that objects are put in place before references to them.

Sep 23 '20 20:09 joshtriplett

I absolutely agree, thanks for making that explicit.

How would you create multiple files atomically? This would be required to perfectly conclude a clone, which currently is done in a plumbing command by creating refs sequentially.

Would you go as far as to maintain an undo list to respond to interrupts properly?

Sep 23 '20 21:09 Byron

@Byron Catching interrupts won't handle kill -9, or a power failure, or similar.

You don't have to make entire operations atomic when that isn't possible. It's OK if an interrupted fetch leaves some refs created and others not; another fetch will clear that up. (Clones can be made atomic by doing them into a temporary directory and renaming that directory into place.) What must not happen is a ref getting created but the object it references not existing, or an incomplete pack existing where git looks for packs, or other cases where the repository is in an inconsistent state. There should always be an order you can perform operations in such that the repository remains consistent.

Sep 24 '20 04:09 joshtriplett

I see, so rather than complicating things try to design operations to never corrupt the git repository, and ideally recover automatically next time the operation is run in case resources have been leaked, like some refs still being present after interruption.

In the same vein, operations should probably be hardened against races, too. For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process. Thinking about it, since there is a special kind of temp file which is always cleaned up if a process is killed, that could possibly be used as marker to differentiate this case as well.

Generally, when seeing the .git repository as database, all best-practices should certainly be applied to prevent corruption and allow (auto) recovery.

Sep 24 '20 07:09 Byron

For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process.

You could potentially handle that with file locks, which don't outlive the process holding them.

Sep 24 '20 23:09 joshtriplett

git-tempfile and git-lock will help for sure in keeping the repository consistent in cases that are not kill -9. It's still to be figured out how to respond to stray locks though, especially on the server side where doing so should probably be automated.

Jun 24 '21 06:06 Byron

It is potentially interesting to note that git.git repacks refs when more than one is updated in a transaction (for —atomic support). Last time I checked, libgit2 doesn’t do that, making ref transactions not actually atomic.

Jun 25 '21 07:06 kim

@kim And on top of that one has to write the reflog which isn't contained in the packed-refs file and consist of a file per ref. I will still have to see how exactly that is locked, but I am hopeful it's made in a way to not completely tank performance.

Jun 26 '21 01:06 Byron

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Guess you can see now why I keep crying “reftable” ;)

Jun 26 '21 09:06 kim

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Super interesting, thanks for sharing! I checked it against my current sketch for transactions and am glad this would naturally be supported. I also noted that in bare repos, no reflog is created otherwise which wasn't on my radar yet.

Guess you can see now why I keep crying “reftable” ;)

There is no way around it on the server for sure. Since it operates in blocks it's probably less likely to be overwhelmed by the amount of filesystem events that it produces. But whatever it's going to be there is no escape if there are too many events it's either the files backend or the reftable.

Jun 26 '21 09:06 Byron

if there are too many events

I misspoke there, it's more related to the notify crate not allowing the set event masks for portability reasons, and also the inherent raciness of watching directories recursively.

Jun 28 '21 14:06 kim

gitoxide gitoxide copied to clipboard

Ensure robustness when killed in the middle

gitoxide
gitoxide copied to clipboard