git-subtrac
git-subtrac copied to clipboard
Improve README
Currently, README seems quite clear ... for people knowing the nuts and bolts of git. But for not-so-good developers (like me), things seems a little ... opaque.
Can you add an example of how subtrac transforms a repository (typically, I would like to see how the submodules are transformed into valid commits)? I would also like to see how to add new submodules to a project where subtrac is already used.
I think, some graphical explanation would clarify the merits of git-subtrac
.
It's notoriously difficult for someone who really understands the guts of a system to write clear documentation for people who don't. If someone would like to try updating the README in a pull request, I'd be happy to accept.
Otherwise I can make some attempt in the next few days, but it won't be perfect :)
ok, but first, let's try to see if I understand correctly.
Suppose I have a repository (let's call it myProject
) with one branch (let's say master
) and in that branch a submodule called mod
.
On the initial run of subtrac
- a branch called
master.trac
will be created - in that branch, a commit will be created having for parents both the commit adding the submodule in
master
AND the commit in themod
project that created the data.
Is it ok ?
And then, how will updates happen ? Should I go into the master.trac
branch or is there any other operation that allow me (from the master
branch) to update the submodule ?
On a technical side, I guess I'll be able to draw that using https://gitgraphjs.com
Something that also should be explained is how to make changes in a subrepo and push those changes to the parent repository with or without pushing the changes in the subrepo to upstream.
ok, seems like my understanding of subtrac is not good enough to even try to document it. Let me try to see if I understand correctly ...
The problem subtrac tries to solve is that submodules are
- Scattered in repository
- Have independent history from the repository
So what subtrac do is rewriting submodules as commits with multiple parents : one is the classical commit parent, and the other(s?) is the commit of the submodule source. But this is done is the famous .trac
branch (due to reasons I don't yet understand).
However, I have at least understood that, if I want to add a submodule to my project, I do as usual then call git subtrac update
to have that new submodule transformed into a commit into the .trac
branch. I guess that the there is also a link between the commit in the .trac
branch and the commit in the standard branch, to have the link correctly memorized.
Well, I'm gonna test all that to make sure I understand things correctly.
Thanks for volunteering to help. Let me try to answer your questions in bulk. Not sure whether this is the ideal medium or not for a complex discussion, but we can give it a shot!
I'll explain using git techie talk for now. Let me know if there are any parts that you don't understand. Hopefully you'll have some luck paraphrasing it into something more comprehensible to normal people :)
Can you add an example of how subtrac transforms a repository (typically, I would like to see how the submodules are transformed into valid commits)? I would also like to see how to add new submodules to a project where subtrac is already used.
So, the first thing to note is that subtrac doesn't actually transform anything. There are separate DAGs of commits (which each point at trees, which each point at trees and blobs and maybe submodule commits). The DAGs are:
-
the parent project (aka supermodule). This is the toplevel one that refers (somewhere in its tree) to submodule commits. If there are recursive submodules (ie. submodules that point to further submodules), then a submodule can also be a supermodule. A supermodule can point, at different points in its history, to different points in the history of its submodules.
-
each submodule also has its own completely separate history (DAG of commits).
Normally, these separate DAGs are stored in separate git repositories: one per sub/super module. But there is nothing stopping you from fetching one git project's history into another; you can do it at any time with just 'git fetch https://...whatever... master' for example, and then FETCH_HEAD will refer to the history of a whole different module.
What git-subtrac does is construct a new DAG (the trac branch), with new commit objects. Those new commit objects refer to the submodule commit objects that were referred to by supermodule trees. In the trac branch, instead of referring to the submodule commits via trees, we refer to them via commit parents.
In a sense, the trac branch is created by "transforming" the supermodule commit DAG. However, it's important to note that:
-
the trac branch does not contain any of the supermodule's commits, trees, or blobs. If you push a copy of the trac branch somewhere, it links (directly or indirectly) to all your submodules, but not to any of the code in the supermodule.
-
thus, the original supermodule DAG is still needed if you want the supermodule's code. This is different from git-subtree, which does a similar sort of transformation, but you merge the resulting new DAG into your supermodule's original DAG as if it were a branch.
-
thus, the original supermodule DAG and all the submodule DAGs still exist and are still needed. We just have one more DAG, and it happens to also refer to all the submodule DAGs.
Suppose I have a repository (let's call it myProject) with one branch (let's say master) and in that branch a submodule called mod.
On the initial run of subtrac
- a branch called master.trac will be created in that branch, a commit will be created having for parents both the commit adding the submodule in masterAND the commit in the mod project that created the data.
Almost: the new commit does not include the commit that adds the submodule. It only includes the submodule's commit.
- And then, how will updates happen ? Should I go into the master.trac branch or is there any other operation that allow me (from the master branch) to update the submodule ?
Whenever you update the submodule links from the supermodule (in the usual way for submodules: git commit in the submodule, then git add + commit in the supermodule), you need to run 'git subtrac update'. This will extend the trac branch to include any new submodule commits references by your new supermodule commit, if there are any.
If you make a commit to the supermodule that doesn't change any submodule links, the trac branch will be unchanged, because of the way git-subtrac prunes redundant commits.
If instead of adding one commit to the submodule, you rebase or revert or reset or switch branches or whatever, exactly the same thing happens: exactly one new commit is added to the trac branch that references the new top-level commit in the submodule branch, referred to by the one new commit in the supermodule branch.
Note that you can look at the trac branch in something like gitk and it... sort of makes sense. It gets a little tangled :)
On a technical side, I guess I'll be able to draw that using https://gitgraphjs.com
Cool, I've never seen that before! I don't know if it'll work with a README.md (will it?) but maybe it would work with readthedocs.io or something. Or just a blog post somewhere that we can link to.
Something that also should be explained is how to make changes in a subrepo and push those changes to the parent repository with or without pushing the changes in the subrepo to upstream.
Yes, this is a very important feature. Note that the trac branch contains the complete history of the superproject's references to its submodules, no matter how many rebases or branch changes or resets they might have gone through.
You can maybe think of it like 'git reflog' - the submodule history doesn't just "fast forward" like with git-subtree or git merge. You can roll it backwards, sideways, etc, and git-subtrac remembers each step along the way (or rather, each step where you snapshotted it with a commit in the supermodule).
Another really important note is that branches (or tags, refs, etc) don't matter at all in the submodules anymore. The supermodule refers to commit ids in the submodules, not branches. When you branch the supermodule, you also (effectively) branch the submodules at the same time, and you can do whatever you want in the submodules without affecting any other branches. The only catch is that you need a .trac branch for every branch in the supermodule.
(When you merge your supermodule branch back into master or whatever, the .trac branch will reflect the same merge. 'git subtrac update' will regenerate the .trac branch from scratch, but the result will be a branch that looks like a merge of the master.trac and feature.trac branches.)
As a result, forks and branches work exactly the same way. Someone can fork your supermodule on github and make their own, and they can edit the submodules as much as they want as part of the fork, and you can merge the fork back in later (as long as you also git subtrac update after the merge).
Now, as for the "upstream" copy of the submodule, that's just a different version of the submodule's history, with its own branches. At any time, you can cd into the submodule from your supermodule, and do something like git push github.com:... HEAD:master and send the current version upstream. But this isn't necessary; it doesn't affect anything about the superproject. It's just another copy of the repo.
The problem subtrac tries to solve is that submodules are
- Scattered in repository
- Have independent history from the repository
So what subtrac do is rewriting submodules as commits with multiple parents: one is the classical commit parent, and the other(s?) is the commit of the submodule source. But this is done is the famous .trac branch (due to reasons I don't yet understand).
(I'm answering all the questions in sequence in this mega-comment; this is a duplicate but I'll answer it here anyway :)) Almost true, except as mentioned above, the .trac branch does not use the superproject commit as a parent. It only uses the submodule commits as parents, as well as other synthetic trac commits if needed.
However, I have at least understood that, if I want to add a submodule to my project, I do as usual then call git subtrac update to have that new submodule transformed into a commit into the .trac branch.
Not exactly transformed: augmented. We build a new DAG of commits that includes the entire, original, DAG of commits from the submodule history.
I guess that the there is also a link between the commit in the .trac branch and the commit in the standard branch, to have the link correctly memorized.
Actually no, there is no link. git submodule links are links from a tree to a commit. That is, a "directory" of your supermodule is a commit in a submodule (which contains its own files, so it appears as a directory in your worktree). A normal submodule link therefore is not referenced from the supermodule commit directly; only indirectly through the commit's tree or one of its subtrees.
The .trac branch is a bunch of synthetic objects, but none of those objects are needed in order to checkout the supermodule or any of its submodules. It's purely a bookkeeping mechanism, telling git how to push or pull all the submodule commits to wherever you want to push or pull them, in a nice bundle. But the trac DAG is never referred to by anything in the supermodule DAG (which is unmodified), nor vice versa. You need them both.
(It's tempting to think about merging the trac branch into the supermodule DAG, so then you only have to push and pull a single branch! But that would clutter your project history horribly, and everyone hates that. You'd see every single commit from every single submodule whenever you 'git log' in your supermodule. Bleh.)
Phew. Hope this helps! If you can make that more readable, I'll be very impressed and thankful :)
I've started a weird webpage (to be used in gh-pages). Currently, I have a small example of the submodule problem. You can take a look there http://nicolas-delsaux.hd.free.fr/subtrac/ What do you think ?