git-meta Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit

Our typical workflow very much depends on branches across multiple repos. The meta repo state that should be recorded for every commit should not only record the hash of the subrepos, but also the branch of the subrepo. This branch info is pretty important.

Envisoned workflow:

git meta checkout -b release_v1
for all subrepos:
  git checkout -b release_v1
  git push -u origin release_v1
git meta commit -a -m "Created release_v1 branch" # records "current" branch for all open subrepos
git meta push

The next time we need to do a hotfix on the release branch, what I'd like to do:

git clone meta
git meta checkout release_v1
git open some_subrepo # Exactly the same as today, but create a local branch with the name of the recorded branch.  Set tracking of the local branch to remote branch (if it exists).

At this point, if someone else has directly made updates to branch some_subrepo/release_v1, I can just to a "git pull --fast-forward" to bring things up to date.

Without his branch info, we have to manually guess or somehow record as part of the commit what the "working branch" was at the time of the "git meta commit". A meta branch could potentially mix and match subrepo branches (i.e. meta:feature_a = subrepo1:master + subrepo2:feature_a + subrepo3:feature_b). On a new clone, we want to know which branch we should continue to work on for each subrepo.

The "git meta" can remain lightweight and not push branch names upstream, and leave this as a manual step for each subrepo.

Sep 07 '19 00:09 jhdub23

The architecture doc explains why we don't do this: because then you have (as you note) the possibility of "shear" between the submodule branches and the meta branches. If some_subrepo has branch release_v1 set to commit X, but the meta repo's branch release_v1 is set to commit Y, who wins? The only possible answer is the meta repo, because that's the only one we can update atomically.

The idea is that you never make submodule commits outside of the context of the meta repo. That's what git meta is for: to make it easy to make submodule commits from within the meta repo.

Sep 07 '19 00:09 novalis

I read through the architecture doc in detail. Maybe I'm missing something, but how is this different from your local repo branch pointing to one commit, and the remote repo pointing to a different commit? If the branches have diverged, then "git push" fails until you resolve the divergence through rebase or merge.

Sep 07 '19 01:09 jhdub23

In this case, the potential shear is between the meta repository's branches and the submodules' branches. You can't push atomically to the meta and submodules, or to multiple submodules (without weird custom server stuff, anyway). So it's possible for them to get out-of-sync. The question is: what are the semantics of this? We solve the problem by ignoring submodule branches, and only considering meta branches. (Inside Two Sigma, we do have a cronjob that populates submodule branches from meta repo branches, just for ease of browsing, but it's kind of a hack).

Sep 07 '19 01:09 novalis

Yes, I see the race condition/atomic problem across multiple repos if you actually try to push updates to branch heads. I'm thinking more along the lines of a synthetic meta branch head. On meta commit, record the commit hash plus current branch_name. This hash may or may not match origin/branch_name; we don't really care. We allow divergence with origin, and maybe just print a warning. On "git meta open", do the equivalent of:

git checkout -b branch_name hash git branch --set-upstream-to origin/branch_name

At this point, the branch may be in a divergent state with respect to origin, but this is true today. The only difference is that you know what the original branch was for your rebase or merge operation, instead of having to guess.

Sep 07 '19 02:09 jhdub23

You don't need the original branch for your merge/rebase, because you can use git meta merge or rebase, which works on meta commits.

But if this is really something that seems exciting, you might be able to do this with hooks. But I think it would be confusing to have that sort of divergence.

Sep 07 '19 02:09 novalis

Could you say a bit more about why you need the original local branchnames? Typically, in git, the local branch names are conidered ephemeral. They may persist in a commit message like "Merge from feature_a", but once merged, they're typically deleted, and then they age out of the local reflog, and then those names are gone forever. We usually use local branch names as a hint to the local developer, to help them switch between multiple simultaneous in-progresss efforts.

It sounds like you have a different use case for local branch names? And it also sounds like you kind of care about remote branch names, but also kind of don't care? If you could say more about what you use the branch names for, maybe we could find a way to support your workflow with git-meta. (In particular, if remote branch shear is not a problem for you....)

Sep 07 '19 14:09 abliss

Perhaps one could use the submodule..branch config in .gitmodules for this: https://git-scm.com/docs/gitmodules#Documentation/gitmodules.txt-submoduleltnamegtbranch

I believe git-meta currently ignores this property (and we don't set it inside Two Sigma), but it might be easy to make git-meta open do what you're looking for when it's present.

On Sat, Sep 7, 2019, 10:02 Adam Bliss [email protected] wrote:

Could you say a bit more about why you need the original local branchnames? Typically, in git, the local branch names are conidered ephemeral. They may persist in a commit message like "Merge from feature_a", but once merged, they're typically deleted, and then they age out of the local reflog, and then those names are gone forever. We usually use local branch names as a hint to the local developer, to help them switch between multiple simultaneous in-progresss efforts.

It sounds like you have a different use case for local branch names? And it also sounds like you kind of care about remote branch names, but also kind of don't care? If you could say more about what you use the branch names for, maybe we could find a way to support your workflow with git-meta. (In particular, if remote branch shear is not a problem for you....)

Sep 07 '19 14:09 abliss

We have a fairly large organization with multiple business units and multiple product groups within each business unit. Each product group has its own set of git repositories. We want to move to a monorepo across our entire company, but the scaling issues associated with one giant git repo is not acceptable. The git-meta architectural doc pretty much sums up our own conclusions, and we are now exploring using git-meta.

As a pilot deployment, we would layer git-meta on top of our existing repos. This was one of the bonuses of git-meta; it could coexist with our current multi-repo workflows. However, deploying git-meta without disrupting current multi-repo workflows means that what we see at the git-meta level should be consistent at the multi-repo level, and vice-versa.

We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed. If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing.

Relying on every developer to manually figure out what branch to check out when synchronizing the subrepo itself would be very error prone. Ideally, when we check out a branch of the meta and do a "git meta open", it would automatically change your local branch appropriately. At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:

cd meta
git meta open subrepoA
cd subrepoA
# make changes to subrepoA
git commit
git pull --rebase # Just works.  We are on the correct subrepo branch associated with the meta branch
git push
git meta push

vs

cd meta
git meta open subrepoA
cd subrepoA
git checkout -b <hmmm, what meta directory am I on?  release_v1? develop? master?  Am I already on the branch, or detached?>
...

We don't need git-meta to do any automatic subrepo branch ref updating. The subrepo branch pushes can be left outside of the scope of git-meta. The only additional functionality is that enough info is recorded in git meta commits so that a git meta open would automatically do the correct "git checkout -b" and "git branch --set-upstream-to" upon a "git meta open".

Sep 07 '19 21:09 jhdub23

I think we would accept a patch to do this, as long as it was optional.

Sep 08 '19 13:09 novalis

We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed.

What happens if two people try to push a hotfix to the same branch at the same time? The architecture doc describes a possible race here: if each of two pushes succeeds in pushing to a different set of repos, they can become permanently deadlocked. Do you force developers to take out a central lock for the duration of the hotfix push?

If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing.

Makes sense, but it doesn't sound like you actually have any dependency on the local branches in users' submodules. It sounds like it would be enough to ensure that, when pushing meta commit M:m={A:a, B:b} to branch br1, each submodule commit a,b,... must also be at the tip of a branch br1 in its own remote. You could probably do this without any change to client-side git-meta, by simply adding a remote pre-receive hook--something like this:

take the global lock on the name br1
For each submodule {$remote, $commit} in m, in parallel: 2a. assert that $remote contains $commit (this is what the current pre-receive hook included in git-meta checks) 2b. assert that the current branch br1 in $remote is an ancestor of $commit (else die and reject push)
For each submodule {$remote, $commit} in m, in parallel: 3a. fast-forward br1 to $commit in $remote
relase the lock on br1

There might be some extra complexity around submodules being added/deleted/relocated, especially if you allow multiple commits to be pushed at once.

Also, if not everyone is using git-meta, there's a risk that someone would push a hotfix to the submodules' branches and neglect to update the meta branch. If you already have some exogenous procedure for locking the release branch to push hotfixes, you could probably patch it to ensure integrity with the meta repo.

At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:

cd meta git meta open subrepoA cd subrepoA

make changes to subrepoA

git commit git pull --rebase # Just works. We are on the correct subrepo branch associated with the meta branch git push git meta push

We discourage our users from doing manual pulls and pushes in the submodules. It quickly causes the meta repo to get into inconsistent states which are hard for the user to understand. But, as described above, there should be no need for it. If you have taken care to ensure that the meta-repo branch br1 always points each submodule s to the head of br1 in s's remote, then user can simply git meta pull --rebase to atomically bring them from one consistent state to the next, and a single git meta push can atomically publish their work.

Sep 08 '19 19:09 abliss

What happens if two people try to push a hotfix to the same branch at the same time? The architecture doc describes a possible race here: if each of two pushes succeeds in pushing to a different set of repos, they can become permanently deadlocked. Do you force developers to take out a central lock for the duration of the hotfix push?

Currently, pushes happen manually one repo at a time (using the deprecated "gits" for some groups, manually by other groups), so pushes involving multiple repos can be interleaved between two people. At this point, both have to pull (whatever subset of repos that has been pushed), compile, run tests, then continue pushing. It's true that for a very short period of time, repos become out of sync, but this is quickly resolved by both parties. We've learned to live with this in our multirepo system. It's similar to a bad push causing compile or QA failures; when it happens, it's the highest priority to fix immediately.

Manually pushing the subrepos, even with git-meta, would maintain the status quo. However, git-meta pushes would be atomic and record states before we get into manually resolving the "race condition", so this would be an improvement to our current system.

We discourage our users from doing manual pulls and pushes in the submodules. It quickly causes the meta repo to get into inconsistent states which are hard for the user to understand.

I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go.

We will need to support both git-meta monorepo and our existing multirepo workflows for the transition period, and have a fallback plan if git-meta proves to be problematic for risk management. I believe that this would be true for any company with well established multi-repo workflows.

As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule.

Sep 09 '19 04:09 jhdub23

I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go.

Oh believe me, I understand how that can go :)

To be clear, I didn't mean to encourage the whole organization to use git-meta instead of git. I meant that once a particular user has decided to use a git-meta clone, it's best if that user sticks to git meta push and git meta pull in that clone, rather than mixing in raw git submodule push/pulls. Maybe your users will turn out to be more submodule-savvy than ours, but most have found it terribly confusing.

As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule.

For release branches, if your push rate is low, you may be able to get away with it. But I would like to opine that the biggest benefits of monorepo development do not accrue until you start using git-meta to snapshot all pushes to master. For example, bisecting across the history of the meta repo to find a bug is extremely powerful. (To be fair, this is also when the biggest pains accrue. A good discussion of the tradeoffs is at https://trunkbaseddevelopment.com .)

Sep 10 '19 12:09 abliss

The problem is when we have some people using git-meta and some not. How does a meta-user push his commits back to the subrepo so that the non-meta-user can see them? It's easy without branching, as there is only one branch (master). However, when there are branches, this branch selection becomes problematic.

I'll play around with branching within meta and construct a usage example where branch info is stored.

Sep 14 '19 00:09 jhdub23

Would it suffice if git meta push origin HEAD:refs/heads/foo would attempt to push each changed submodule to refs/heads/foo in its own remote? That should be pretty easy to implement, and wouldn't require any local tracking of branch history.

It will have the consistency problems already discussed, and an additional problem that, without some assistance from a remote hook of some kind, it's impossible for the client to know exactly which submodules need pushing. (Unless you have few enough submodules that you can expect users to open all of them.)

Sep 14 '19 01:09 abliss

Yes, I think that would do the trick. The local tracking of branch history is not needed. Git meta open should also read this branch and set the subrepo to this local branch name.

Any coordination with the subrepo origin/branch_name would be left up to the user (along with all the pitfalls). We can tool around this part.

Sep 14 '19 01:09 jhdub23

Ok, I updated the title to reflect the new goal. I propose the config be named gitmeta.pushSubmoduleBranches and I think it should probably be just a couple-line change around https://github.com/twosigma/git-meta/blob/master/node/lib/util/push.js#L230 . WDYT @novalis ?

Sep 14 '19 01:09 abliss

What do you want to do about the consistency problems? Warn the user when things get inconsistent? That seems fine, I guess.

Sep 14 '19 01:09 novalis

Yeah, I guess the default should be something like:

(optional, but probably a good idea): fetch all repos that you plan to push, and check that each push is a fast-forward. If any aren't, fail early with a message like "Please pull and rebase/merge; if that doesn't help, the remote server may in an inconsistent state".
Start pushing repos; if any submodule push fails after the first one succeeds, it should finish other submodules(?), abort the meta push, and print a big scary warning that the server may now be inconsistent
Maybe add a --keep-going flag to complete the meta push and also print the scary warning.

Sep 14 '19 15:09 abliss

For consistency problems, a warning would be sufficient and it's ok to leave it to the user to resolve. Right now, we use the deprecated "git-slave", and if a push fails on a repo, we know we are temporarily in an inconsistent state, but we just resolve it immediately.

I like the --keep-going option.

Thanks for implementing this. We are currently doing a pilot project with git-meta. If successful, we will roll it out to one product group, followed by one Business Unit, followed by the entire company.

Sep 14 '19 19:09 jhdub23

Sorry, just to be clear: we'll happily take a patch on this, but I don't think we're likely to implement it ourselves.

Sep 19 '19 16:09 novalis

I see. Guess I'll have to start looking at the source code. Reaching out to anyone else out there who is already familiar with the code and is willing to make the enhancement...

Sep 20 '19 19:09 jhdub23

git-meta git-meta copied to clipboard

Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit

make changes to subrepoA

git-meta
git-meta copied to clipboard