git-meta
git-meta copied to clipboard
Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit
Our typical workflow very much depends on branches across multiple repos. The meta repo state that should be recorded for every commit should not only record the hash of the subrepos, but also the branch of the subrepo. This branch info is pretty important.
Envisoned workflow:
git meta checkout -b release_v1
for all subrepos:
git checkout -b release_v1
git push -u origin release_v1
git meta commit -a -m "Created release_v1 branch" # records "current" branch for all open subrepos
git meta push
The next time we need to do a hotfix on the release branch, what I'd like to do:
git clone meta
git meta checkout release_v1
git open some_subrepo # Exactly the same as today, but create a local branch with the name of the recorded branch. Set tracking of the local branch to remote branch (if it exists).
At this point, if someone else has directly made updates to branch some_subrepo/release_v1, I can just to a "git pull --fast-forward" to bring things up to date.
Without his branch info, we have to manually guess or somehow record as part of the commit what the "working branch" was at the time of the "git meta commit". A meta branch could potentially mix and match subrepo branches (i.e. meta:feature_a = subrepo1:master + subrepo2:feature_a + subrepo3:feature_b). On a new clone, we want to know which branch we should continue to work on for each subrepo.
The "git meta" can remain lightweight and not push branch names upstream, and leave this as a manual step for each subrepo.
The architecture doc explains why we don't do this: because then you have (as you note) the possibility of "shear" between the submodule branches and the meta branches. If some_subrepo has branch release_v1 set to commit X, but the meta repo's branch release_v1 is set to commit Y, who wins? The only possible answer is the meta repo, because that's the only one we can update atomically.
The idea is that you never make submodule commits outside of the context of the meta repo. That's what git meta is for: to make it easy to make submodule commits from within the meta repo.
I read through the architecture doc in detail. Maybe I'm missing something, but how is this different from your local repo branch pointing to one commit, and the remote repo pointing to a different commit? If the branches have diverged, then "git push" fails until you resolve the divergence through rebase or merge.
In this case, the potential shear is between the meta repository's branches and the submodules' branches. You can't push atomically to the meta and submodules, or to multiple submodules (without weird custom server stuff, anyway). So it's possible for them to get out-of-sync. The question is: what are the semantics of this? We solve the problem by ignoring submodule branches, and only considering meta branches. (Inside Two Sigma, we do have a cronjob that populates submodule branches from meta repo branches, just for ease of browsing, but it's kind of a hack).
Yes, I see the race condition/atomic problem across multiple repos if you actually try to push updates to branch heads. I'm thinking more along the lines of a synthetic meta branch head. On meta commit, record the commit hash plus current branch_name. This hash may or may not match origin/branch_name; we don't really care. We allow divergence with origin, and maybe just print a warning. On "git meta open", do the equivalent of:
git checkout -b branch_name hash git branch --set-upstream-to origin/branch_name
At this point, the branch may be in a divergent state with respect to origin, but this is true today. The only difference is that you know what the original branch was for your rebase or merge operation, instead of having to guess.
You don't need the original branch for your merge/rebase, because you can use git meta merge or rebase, which works on meta commits.
But if this is really something that seems exciting, you might be able to do this with hooks. But I think it would be confusing to have that sort of divergence.
Could you say a bit more about why you need the original local branchnames? Typically, in git, the local branch names are conidered ephemeral. They may persist in a commit message like "Merge from feature_a", but once merged, they're typically deleted, and then they age out of the local reflog, and then those names are gone forever. We usually use local branch names as a hint to the local developer, to help them switch between multiple simultaneous in-progresss efforts.
It sounds like you have a different use case for local branch names? And it also sounds like you kind of care about remote branch names, but also kind of don't care? If you could say more about what you use the branch names for, maybe we could find a way to support your workflow with git-meta. (In particular, if remote branch shear is not a problem for you....)
Perhaps one could use the submodule.
I believe git-meta currently ignores this property (and we don't set it inside Two Sigma), but it might be easy to make git-meta open do what you're looking for when it's present.
On Sat, Sep 7, 2019, 10:02 Adam Bliss [email protected] wrote:
Could you say a bit more about why you need the original local branchnames? Typically, in git, the local branch names are conidered ephemeral. They may persist in a commit message like "Merge from feature_a", but once merged, they're typically deleted, and then they age out of the local reflog, and then those names are gone forever. We usually use local branch names as a hint to the local developer, to help them switch between multiple simultaneous in-progresss efforts.
It sounds like you have a different use case for local branch names? And it also sounds like you kind of care about remote branch names, but also kind of don't care? If you could say more about what you use the branch names for, maybe we could find a way to support your workflow with git-meta. (In particular, if remote branch shear is not a problem for you....)
We have a fairly large organization with multiple business units and multiple product groups within each business unit. Each product group has its own set of git repositories. We want to move to a monorepo across our entire company, but the scaling issues associated with one giant git repo is not acceptable. The git-meta architectural doc pretty much sums up our own conclusions, and we are now exploring using git-meta.
As a pilot deployment, we would layer git-meta on top of our existing repos. This was one of the bonuses of git-meta; it could coexist with our current multi-repo workflows. However, deploying git-meta without disrupting current multi-repo workflows means that what we see at the git-meta level should be consistent at the multi-repo level, and vice-versa.
We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed. If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing.
Relying on every developer to manually figure out what branch to check out when synchronizing the subrepo itself would be very error prone. Ideally, when we check out a branch of the meta and do a "git meta open", it would automatically change your local branch appropriately. At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:
cd meta
git meta open subrepoA
cd subrepoA
# make changes to subrepoA
git commit
git pull --rebase # Just works. We are on the correct subrepo branch associated with the meta branch
git push
git meta push
vs
cd meta
git meta open subrepoA
cd subrepoA
git checkout -b <hmmm, what meta directory am I on? release_v1? develop? master? Am I already on the branch, or detached?>
...
We don't need git-meta to do any automatic subrepo branch ref updating. The subrepo branch pushes can be left outside of the scope of git-meta. The only additional functionality is that enough info is recorded in git meta commits so that a git meta open would automatically do the correct "git checkout -b" and "git branch --set-upstream-to" upon a "git meta open".
I think we would accept a patch to do this, as long as it was optional.
We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed.
What happens if two people try to push a hotfix to the same branch at the same time? The architecture doc describes a possible race here: if each of two pushes succeeds in pushing to a different set of repos, they can become permanently deadlocked. Do you force developers to take out a central lock for the duration of the hotfix push?
If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing.
Makes sense, but it doesn't sound like you actually have any dependency on the local branches in users' submodules. It sounds like it would be enough to ensure that, when pushing meta commit M:m={A:a, B:b} to branch br1, each submodule commit a,b,... must also be at the tip of a branch br1 in its own remote. You could probably do this without any change to client-side git-meta, by simply adding a remote pre-receive hook--something like this:
- take the global lock on the name br1
- For each submodule {$remote, $commit} in m, in parallel: 2a. assert that $remote contains $commit (this is what the current pre-receive hook included in git-meta checks) 2b. assert that the current branch br1 in $remote is an ancestor of $commit (else die and reject push)
- For each submodule {$remote, $commit} in m, in parallel: 3a. fast-forward br1 to $commit in $remote
- relase the lock on br1
There might be some extra complexity around submodules being added/deleted/relocated, especially if you allow multiple commits to be pushed at once.
Also, if not everyone is using git-meta, there's a risk that someone would push a hotfix to the submodules' branches and neglect to update the meta branch. If you already have some exogenous procedure for locking the release branch to push hotfixes, you could probably patch it to ensure integrity with the meta repo.
At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:
cd meta git meta open subrepoA cd subrepoA
make changes to subrepoA
git commit git pull --rebase # Just works. We are on the correct subrepo branch associated with the meta branch git push git meta push
We discourage our users from doing manual pulls and pushes in the
submodules. It quickly causes the meta repo to get into inconsistent states
which are hard for the user to understand. But, as described above, there
should be no need for it. If you have taken care to ensure that the
meta-repo branch br1 always points each submodule s to the head of br1 in
s's remote, then user can simply git meta pull --rebase to atomically
bring them from one consistent state to the next, and a single git meta push can atomically publish their work.
What happens if two people try to push a hotfix to the same branch at the same time? The architecture doc describes a possible race here: if each of two pushes succeeds in pushing to a different set of repos, they can become permanently deadlocked. Do you force developers to take out a central lock for the duration of the hotfix push?
Currently, pushes happen manually one repo at a time (using the deprecated "gits" for some groups, manually by other groups), so pushes involving multiple repos can be interleaved between two people. At this point, both have to pull (whatever subset of repos that has been pushed), compile, run tests, then continue pushing. It's true that for a very short period of time, repos become out of sync, but this is quickly resolved by both parties. We've learned to live with this in our multirepo system. It's similar to a bad push causing compile or QA failures; when it happens, it's the highest priority to fix immediately.
Manually pushing the subrepos, even with git-meta, would maintain the status quo. However, git-meta pushes would be atomic and record states before we get into manually resolving the "race condition", so this would be an improvement to our current system.
We discourage our users from doing manual pulls and pushes in the submodules. It quickly causes the meta repo to get into inconsistent states which are hard for the user to understand.
I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go.
We will need to support both git-meta monorepo and our existing multirepo workflows for the transition period, and have a fallback plan if git-meta proves to be problematic for risk management. I believe that this would be true for any company with well established multi-repo workflows.
As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule.
I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go.
Oh believe me, I understand how that can go :)
To be clear, I didn't mean to encourage the whole organization to use
git-meta instead of git. I meant that once a particular user has decided to
use a git-meta clone, it's best if that user sticks to git meta push and
git meta pull in that clone, rather than mixing in raw git submodule
push/pulls. Maybe your users will turn out to be more submodule-savvy than
ours, but most have found it terribly confusing.
As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule.
For release branches, if your push rate is low, you may be able to get away with it. But I would like to opine that the biggest benefits of monorepo development do not accrue until you start using git-meta to snapshot all pushes to master. For example, bisecting across the history of the meta repo to find a bug is extremely powerful. (To be fair, this is also when the biggest pains accrue. A good discussion of the tradeoffs is at https://trunkbaseddevelopment.com .)
The problem is when we have some people using git-meta and some not. How does a meta-user push his commits back to the subrepo so that the non-meta-user can see them? It's easy without branching, as there is only one branch (master). However, when there are branches, this branch selection becomes problematic.
I'll play around with branching within meta and construct a usage example where branch info is stored.
Would it suffice if git meta push origin HEAD:refs/heads/foo would
attempt to push each changed submodule to refs/heads/foo in its own
remote? That should be pretty easy to implement, and wouldn't require any
local tracking of branch history.
It will have the consistency problems already discussed, and an additional problem that, without some assistance from a remote hook of some kind, it's impossible for the client to know exactly which submodules need pushing. (Unless you have few enough submodules that you can expect users to open all of them.)
Yes, I think that would do the trick. The local tracking of branch history is not needed. Git meta open should also read this branch and set the subrepo to this local branch name.
Any coordination with the subrepo origin/branch_name would be left up to the user (along with all the pitfalls). We can tool around this part.
Ok, I updated the title to reflect the new goal. I propose the config be named gitmeta.pushSubmoduleBranches and I think it should probably be just a couple-line change around https://github.com/twosigma/git-meta/blob/master/node/lib/util/push.js#L230 . WDYT @novalis ?
What do you want to do about the consistency problems? Warn the user when things get inconsistent? That seems fine, I guess.
Yeah, I guess the default should be something like:
- (optional, but probably a good idea): fetch all repos that you plan to push, and check that each push is a fast-forward. If any aren't, fail early with a message like "Please pull and rebase/merge; if that doesn't help, the remote server may in an inconsistent state".
- Start pushing repos; if any submodule push fails after the first one succeeds, it should finish other submodules(?), abort the meta push, and print a big scary warning that the server may now be inconsistent
- Maybe add a --keep-going flag to complete the meta push and also print the scary warning.
For consistency problems, a warning would be sufficient and it's ok to leave it to the user to resolve. Right now, we use the deprecated "git-slave", and if a push fails on a repo, we know we are temporarily in an inconsistent state, but we just resolve it immediately.
I like the --keep-going option.
Thanks for implementing this. We are currently doing a pilot project with git-meta. If successful, we will roll it out to one product group, followed by one Business Unit, followed by the entire company.
Sorry, just to be clear: we'll happily take a patch on this, but I don't think we're likely to implement it ourselves.
I see. Guess I'll have to start looking at the source code. Reaching out to anyone else out there who is already familiar with the code and is willing to make the enhancement...