git2-rs Self-referential repo + tree

I've found that I often need to store a Repository and a Tree<'repo> that borrows from it. Unfortunately, this creates unsafe self-referential structs.

It would be helpful if this crate provided a safe abstraction for repo + tree together, e.g.

let repo = git2::Repository::open()?;

/// makes RepositoryWithTree
let repo = repo.with_tree(|repo| {
	repo.find_commit(head)?.tree()
});

*repo // Derefs to Repository
repo.tree(); // returns borrowed Tree

Alternatively, there's owning-ref that has OwningHandle trait. It would require git2 types to implement its traits.

Sep 18 '20 14:09 kornelski

This I believe is one of the major design flaws of the git2 crate right now. It was one of the first pieces of Rust code I ever wrote and I was pretty excited to use lifetimes. I think that reasons like this make it so it wasn't the right choice for all types. Ideally Tree would retain a strong reference to the repository or owner from which it came so it wouldn't have a lifetime. Similarly objects and other "major" items shouldn't have lifetimes but they do.

I think it would be reasonable though to implement traits from owning-ref behind an optional feature, however, in the meantime.

Sep 22 '20 17:09 alexcrichton

Ideally Tree would retain a strong reference to the repository or owner from which it came so it wouldn't have a lifetime.

That would create a potentially misleading interface, where you could hold onto what you think is a standalone object, but that keeps various other large objects around for longer than expected. It would also add to the size of those objects, and some programs keep a large number of objects around.

I appreciate that git2 uses lifetimes the way it does. Personally, I'm hopeful that some future polonius-based rustc is capable of handling self-references "natively"; until then, using something like owning-ref seems reasonable, and perhaps git2 could provide some additional support to make that easier.

Dec 08 '20 07:12 joshtriplett

Just adding my 2c worth of experience...

I just ran into this, tried making it work with thread-local statics (doesn't work because LocalKey::with doesn't let the reference to the inner value escape the closure), tried making it work with OwningHandle (realised I'd need to throw in some unsafe to make that work, which I'm trying to avoid in this project), and eventually ended up finding this issue.

My use case is a cache that persists for the lifetime of my program, and includes things like the current working tree diff so I don't need to recompute it all the time (once per thread would be fine) or pass it around to all the places that need it. So I'm trying to find a way to stash a Repository and a Diff together somewhere with a 'static lifetime, that also doesn't require those structs to be Send.

I feel like I'm missing something simple here, but I keep running into dead-ends. 😊

Mar 22 '21 02:03 jeffparsons

This I believe is one of the major design flaws of the git2 crate right now. It was one of the first pieces of Rust code I ever wrote and I was pretty excited to use lifetimes. I think that reasons like this make it so it wasn't the right choice for all types. Ideally Tree would retain a strong reference to the repository or owner from which it came so it wouldn't have a lifetime. Similarly objects and other "major" items shouldn't have lifetimes but they do.

Hey, I got a question which might sound silly. Why exactly are the lifetimes or references to the objects owner required at all? Maybe I am misreading the documentation of libgit2 (having never worked with it directly), but as far as I can understand the objects can exist on their own, as we are required to free almost all of them ourselves anyways, right?

I am currently getting bitten pretty hard by the lifetimes (and also the missing Send impls for objects, but a topic for another day) and would consider putting some work into a PR (if there is interest), ripping the lifetimes out for stuff like commits, objects, trees etc. But my C-knowledge is rudimentary enough to potentially be dangerous, so I'd definitely need some Feedback on the general approach.

Whatever the case, thank you (and all others who worked on this) a lot for this library :)

Dec 19 '21 22:12 UhhhWaitWhat

Maybe I am misreading the documentation of libgit2 (having never worked with it directly), but as far as I can understand the objects can exist on their own, as we are required to free almost all of them ourselves anyways, right?

TL;DR: you can't be sure about these things without diving into the libgit2 documentation and/or source.

First of all: deallocation/destruction being the library user's responsibility or not generally has nothing to do with whether an object can life on it's own or not. Especially not in C.

In languages like C, an object (e.g. tree or commit in this case) may contain arbitrary references to data held alive by another object (e.g. the repository). Free the repository while still having a commit in use and you'll end up with a use-after-free bug. Ideally, the documentation would contain some informal description of the requirements on the order of destruction, but as documentation is written by humans it may still be incomplete, and more often than not it doesn't make any declarations regarding this particular topic at all. In the case of libgit2, I'd have a closer look at some more general documentation they provide, then maybe the library reference (though you probably will not find any useful info regarding lifetimes there).

You'll probably need to dive into the source yourself, find out what each object references, whether you have shared data and who's responsible for desctucting and/or deallocating that data. You may encounter some form of reference counting (think Rc or, if you're lucky, Arc), which allows "safely" depending on data originally part of the repository after freeing it, but that's not a given. You really have to find out yourself. And maybe extend the documentation of libgit2 along the way.

Dec 20 '21 12:12 neithernut

Hi thank you a lot for the detailed explanation. The more I learn about C the less I understand how anyone ever manages to write correct or at least semi-correct software in it.

I'll definitely take a better look at the code and documentation to learn then, but I am also even more scared to touch any of it, at least if others might depend on my work in the future. So no PR to this project by me :)

Dec 20 '21 19:12 UhhhWaitWhat

git2-rs git2-rs copied to clipboard

Self-referential repo + tree

git2-rs
git2-rs copied to clipboard