gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

[gixp pack-receive] The first proper fetch to a bare repository

Open Byron opened this issue 3 years ago • 6 comments

Do what's needed to fetch as good as git does (on a bare repository, one without a working tree). This particularly includes proper ref handling as well as safety in the light of concurrent repository access.

Tasks

  • fix gitoxide interrupt and signal handling
    • now it will work for CLIs and servers alike with fine-grained control and no global state (unless the application wants it)
  • git-tempfile (based on git
  • git-lock - a crate providing git-style lock files.
  • git-refs - write loose refs and handle the git reflog, temp files, lock files, packed-refs and namespaces
    • [x] publish latest release (and everything else, too)
  • git-pack - assure packs are written safely, that is won't interfere with multi-packs or other pack writers writing the very same pack. Check how locking works.
    • [x] #153
    • [x] thin pack support in bundle writer
  • gix clone
    • [ ] turn gixp pack-receive into gixp clone creating an empty repository (for lack of index handling/checkout) and cloning the first pack.
  • gix fetch
    • [x] #181
    • [ ] A tool to fetch into an existing repository correctly, creating a new pack and writing refs using transactions (for now without hook execution)
    • [ ] investigate fetch negotiation and see how much work is truly needed there. If logic is involved, make it readily reusable in via git-protocol.
  • git-repository
    • Is there a way to bring transport/protocol related functionality to git-repository to greatly simplifying doing ref-listings and fetches?
Archive

Research

Research

Reflog Handling

  • entirely disabled in bare repos
  • forward iterators could be bstr::lines()
  • reverse-iterators could be bstr::SplitReverse with a VecDeque for refilling a read buffer from the end of a file with seeks.
  • line parsing is here
  • expiry is done by rewriting the entire file based on a filter, writing is literally here

Refs Writing

  • You can turn a symbolic ref into a peeled one (i.e. detach a HEAD) with transactions but you cannot turn it back into a symbolic one with that. All that happens directly and outside of transactions.
  • Writing symbolic references like HEAD splits the ref update transparently and across any amount of refs.
  • You cannot delete ref logs using REF_LOG_ONLY but they are deleted with the owning reference.
  • ref transactions
    • there is a transaction hook which gets all transaction data without flags, that is old and new oid and refname, along with the 'action' indicating what happened to the transaction.
    • probably it should be possible to introspect transactions as they are executing, but theoretically this can also happen outside of the method itself.
  • git file lock
    • it looks like they are creating a tempfile with a specified name for locks (exclusive and all using atomic FS ops) which can then potentially be written in the same moment. Definitely good for loose refs that don't exist.
  • loose refs writing intricately knows packed refs, which makes sense in order to keep them consistent.

File Locking

  • investigate tempfile to conclude that it's certainly great as reference but won't be exactly what git does. Let's see if it's needed after all to do it exactly like that. Git definitely sets up signal handlers to delete tempfiles so probably these will have to be threadsafe or interned objects.
  • If directories are involved, use raceproof file creation
  • lockfile.c holds the entire blocking implementation, including backoff. Looks like that's git-lock.

Reflogs

  • The file is read line by line and entries are handled on the fly using iterators, easiest to use bstr::lines() there.
  • reverse iterators use a buffer of 1024 bytes to seek lines backwards
  • parsing is here
  • for expiry the file is rewritten based on iteration
  • for new reflogs, these are appended (only)

Refs Writing

  • git file lock
    • cargo uses flock for comparison with different semantics.
    • fslock seems a bit newer and has a few tests
    • fs2 does not compile anymore and seems unmaintained for years now. Can do more than we need, too.
    • file-lock is posix only but uses fcntl under the hood.

Signal-Hook

  • The use of mutexes is unsafe as the current thread might be interrupted while holding the mutex. When trying to obtain a lock in the handler the thread will inevitably deadlock.
  • Memory allocation and deallocation is not allowed! So inside a handler we have to do what we do and call std::mem::forget to implement it correctly.

Done Tasks

  • prodash
    • replace usage of ctrlc that starts yet another thread with the signal-hook iterator to process pending events from time to time as part fo the ticker thread. Saves a thread and enables proper handler chaining.
  • git-features
    • Replace ctrlc usage with signal-hook (i.e. current atexit handler for interrupts)
    • don't use stdout in interrupt handler as it does use a mutex under the hood. Instead allow aborting after the second interrupt in case the application is not responding. It would be great to have a lock-free version of stderr though… .
    • Integrate 'git-tempfile' behind feature toggle to allow interrupt handlers to be tempfile handler aware and not interfere.
    • replace existing usage of git_features::interrupt::is_interrupted() with versions of it that are local to the method or function.
    • move git-features::interrupt into git-repository as this kind of utility is for application usage only. There the git-tempfile integration makes sense, too.
  • git-tempfile
    • registered tempfile support to allow deletion on exit (and other signals). Use dashmap as storage.
    • Make sure pid is recorded to assure forking works as expected.
    • docs
    • fix windows build
    • a test validating default handlers are installed
    • release
    • race-proof creation of directories leading to the tempfile
    • a way to use the above for actual tempfiles
    • race-proof deletion of empty directories that conflict with the filename
    • a way to use the above for actual tempfiles
    • differentiate between closed and writable tempfiles in the typesystem to make choice permanent
    • a way to not install any handlers so that git-repository interrupt can run the tempfile removal itself right before aborting.
    • Make with_mut less cumbersome to use by assuming the interrupt handler will indeed abort.
  • git-lock - a crate providing git-style lock files.
    • lock file for update
    • marker for holding a lock
    • exponential backoff
    • the above with randomization
    • actual retries with blocking sleep
    • test for the above
  • git-refs
    • sketch transaction type
    • figure out whether or not to 'extend' the API to include changes from Symbolic refs to peeled ones in transactions
    • git signature parsing code is shared and moved to git-actor
    • git-object uses git-actor
    • git-object: unify nom error handling everywhere (to reuse the nom error handling machinery instead of re-inventing it)
    • git-object can use verbose errors and () - unit errors per feature toggle.
    • parse ref log line
    • reflog forward iteration
    • reflog backward iteration
    • file reflog writing
    • git-tempfile close (Handler<Writable> -> Handle<Closed>)
    • git-lock File close and Marker persist
    • an API to access ref logs for a reference
    • create single symbolic ref without reflog
    • split refs and reusable edit preprocessing
    • delete refs with reflog handling
    • handle parent links for 'old' oid in the log of parent refs
    • handle parent links for error messages of reference names (for lock errors at least)
    • Figure out how to deal with 'previous-value' ambiguity with create-or-update modes.
    • git-lock commit() is recoverable
    • commit()'ing onto empty directories can delete the directory in git-ref
    • internal reflog writing or appending for locked refs
    • persisting lock file onto an empty directory deletes the empty directory and tries again
    • create or update refs with reflog handling
    • research different mmap implementation but ultimately stick to fast-and-simple filebuffer
    • packed-refs iteration - important for being able to read all refs during packfile negotiation
    • iter packed refs from separately loaded buffer
    • iter loose refs with prefix
    • packed-refs lookup with binary search (full-paths)
    • packed-refs lookup with binary search (partial-paths), following lookup rules
    • re-add perf test of sorts, see script to generate big pack file
      • ~6.2mio/s in iteration and 720k/s for lookups/finds using full paths
    • use binary search to find start point for packed prefix iteration
    • iterate all refs (including packed ones)
    • the above, with prefix filtering
    • find_one uses packed-refs if available (use appropriate strategy for reading in full or mapping)
    • remove and test remaining todos
    • packed-refs writing and integration with transaction (must be) - deletions have to be propagated, updates only go to refs (I think, check)
    • #138
    • #139
    • #140
    • #152
    • Make sure broken/invalid loose refs don't break ref iteration and have a way to find them

Byron avatar Jun 13 '21 01:06 Byron

What's the status on gixp clone? I'm very much interested in helping out on that front.

Nytelife26 avatar Sep 29 '21 00:09 Nytelife26

gixp clone as it's seen here would only clone bare repositories. The biggest requirement for achieving work tree checkouts is to implement git-index. Doing so requires a serious investment in time and great attention to detail. There may be smaller tasks on the way but ultimately, git-index is what's needed to clone a repository with work tree.

If this is outlook isn't too frightening for you, I'd be happy to get you involved in some capacity.

Byron avatar Sep 29 '21 00:09 Byron

I have never contributed to gitoxide so I'm not too familiar with it yet, but I learn things quickly - nothing frightens me :) so yes, I'm more than happy to try things out if you give me some pointers in the right direction.

Nytelife26 avatar Sep 29 '21 22:09 Nytelife26

Have you had a chance to check out the backlog here? https://github.com/Byron/gitoxide/projects/1

A good way to get acquainted with gitoxide would probably be to use it by further oxidizing some crates that are using git2 ATM but could already use gitoxide. This would inevitably lead to some features being implemented or improved on on the way.

~~Speaking of feature, I think desperately needed is commit ancestor traversal sorted by commit time.~~

A way forward would be for you to find something you are comfortable to get started, then we could kick it off in a 1:1 even.

Just let me know.

PS: I connected to you on keybase, a way to reach out to me in a more realtime and private fashion, as needed.

Byron avatar Sep 30 '21 07:09 Byron

@Nytelife26 @Byron Had the chance to get progress on this one ? :)

pwnorbitals avatar Mar 20 '22 15:03 pwnorbitals

All building blocks for a bare clone exist, they haven't been put into a cohesive package though.

A non-bare clone is in the works which will include the bare one by its very nature.

Byron avatar Mar 21 '22 01:03 Byron