rushstack
rushstack copied to clipboard
[rush] [feature] support for git submodules (they currently break rush build algo)
EDIT, this bug converted into a feature request.
Is this a feature or a bug?
- [x] Feature
- [ ] Bug
Please describe the actual behavior.
When I start clean, rush build builds all my projects that are listed in rush.json.
After I edit a file in one project, then when I run rush build again it skips the project and does not build it.
What is the expected behavior?
I was expecting rush to build the project because I modified it.
workaround
for now I have to use rush rebuild --to package-name to force it to rebuild the package.
If this is a bug, please provide the tool version, Node.js version, and OS.
- Tool: rush
- Tool Version: 5.18.0
- Node Version: 13.6.0
- OS: NixOS
Is the file you edited ignored by .gitignore? Is it listed in <project>/.rush/temp/package-deps_build.json?
The file is not ignored by .gitignore, and in <project>/.rush/temp/package-deps_build.json I see:
{
"files": {},
"arguments": "tsc -p ./tsconfig.json "
}
Should it be listed there? How does it know what to list there?
Ooooooh. I think I see the issue:
My <project>s are git sub modules. The "umbrella repo" that contains them, where I run rush build can not see what files changed because that's the responsibility of the sub module.
Is there an alternative way to tell rush which files to check for changes?
Or maybe Rush can detect a sub module (or really just any folder with .git in it), go in there, then use git in that folder.
I'd like to keep each project as a sub-module, because then they carry their own history which is not intertwined with other packages.
Having them as separate repos also allows them to be installable from git, not just from NPM. For example, a consuming project's package.json can have:
"dependencies": {
"the-package": "github-username/repo-name#some-branch"
}
which makes it easy to temporarily rely on git refs from forks, etc.
This is something we haven't tested. This should become a feature request but we will need further design and a better understanding of what it means to support this scenario.
This is something we might want to think about including in an upcoming major release.
I haven't looked at the source yet, but it seems that at the point that Rush detects changes, instead of running getFiles(exec('git diff ...')) (or similar pseudo code) it could run projects.reduce((a, p) => a.concat(getFiles(exec('cd '+p.path+' && git diff ...'))), []) to get the list of modified files.
@apostolisms Any update on this item? Currently we have a lot of git repos and we want to migrate them smoothly to mono repo with submodule, but currently we are not sure about rush support for submodule. Rush build algo failed to detect changes with
Error calculating the state of the repo. (inner error: Error: git hash-object exited with status 128: fatal: Unable to hash /home/htc/openapi-alps/@azure/rest-api-specs-scripts ). Continuing without diffing files.
One of the main benefits of a monorepo is the "you broke it, you fix it" property:
A PR cannot merge unless it successfully passes the tests for all downstream projects.
Without this, as you scale up, when one team makes a change to an upstream project, they can break another team. That team becomes a downstream "victim" because they don't encounter the problem until later, when they need to upgrade to the latest version. Normally this problem arises via NPM publishing (see this article for a basic sketch). But in the case if Git submodules it will happen when you update the submodule hash. The monorepo avoids this by pushing responsibility for breaks back onto the person who wants to make the change, avoiding downstream victims.
With submodules, you could work around this problem by using CI for each submodule repo, i.e. blocking PRs from being merged unless the main repo first completes a successful build. But this reproduces the exact same pros/cons of a monorepo, just in a more awkward form. So technically submodules don't seem to bring much benefit.
One benefit of submodules is that they would be presented better via GitHub, because GitHub's design (today) does not consider monorepos very much. Today, GitHub's UI assumes each repo is a single project. GitHub should really fix that. Other systems have better support for monorepos.
If you want to proceed anyway, it sounds like certain Rush operations assume that there is only one repo -- maybe we can relax that. For the incremental build issue you pointed out, the package-deps-hash library leverages Git as an optimization to quickly calculate file hashes. We could probably generalize that to work with submodules. We would certainly accept PRs to make these improvements. (However I'm not very interested in investing in that myself, as I don't see how submodules can scale. We try to focus on battle-tested strategies that will scale.) But if someone else wants to implement this, go for it.
I haven't looked at the source yet, but it seems that at the point that Rush detects changes, instead of running
getFiles(exec('git diff ...'))(or similar pseudo code) it could runprojects.reduce((a, p) => a.concat(getFiles(exec('cd '+p.path+' && git diff ...'))), [])to get the list of modified files.
This would be slower. If we need a workaround like this, it may need to be behind an "opt-in" setting that enables support for submodules.
The rush change command may also have trouble, since it tries to determine what has changed since the base branch of your PR, which also involves some Git assumptions. Probably also solvable, though.
Currently we have a lot of git repos and we want to migrate them smoothly to mono repo with submodule,
Why not create a new clean monorepo, and then merge the projects into it one by one? IMO this will go a lot smoother than trying to maintain two models side by side.
@octogonz Thank you for your reply! Our problem is a little bit complicated: we have several public repos and several private repos, and we still want to maintain them with monorepo. That means we need to synchronize between public repo and private monorepo. Previously I'm considering git submodules, but after some research I found git subtree. We can push/pull from a git subtree which meet our needs, however I'm not sure about limitation of git subtree with rush. Do you have any comment on our situation? Thanks in advance!
+1 on this. We use rush in project, and I want to import my scss project (not npm at all; just pure scss files) as submodule, so i can import my mixins. Currently rush fails to find changes in module (not submodule itself) which has submodules inside.
Our problem is a little bit complicated: we have several public repos and several private repos, and we still want to maintain them with monorepo.
This scenario makes sense to me. For this setup, ideally the CI should treat them as a monorepo. In other words, for a pull request to any repo, it will use the same CI build/tests that build the integrated monorepo and run all its tests.
We can push/pull from a git subtree which meet our needs, however I'm not sure about limitation of git subtree with rush. Do you have any comment on our situation? Thanks in advance!
From what I understand:
- submodules store a pointer to a specific commit hash from an external repo. When people clone, they need to provide an extra option to tell Git to also clone the files from the external repo.
- subtrees store a squashed copy of an external repo in your repo. I believe an extra squashed commit gets added every time you update the subtree. (?) This makes the local repo larger, but it makes cloning simple and transparent.
When you make changes to a submodule, it has its own .git folder, so you can push to the external repo normally. Whereas if you make changes to a subtree, from what I understand they first get committed to the squashed copy. Then git subtree somehow creates a normal clone of the external repo, and cherry-picks your changes to that, and then pushes it. People say it can be slow.
git subtree does sound like it would be likely to work with rush build, since the subtree files will have normal file hashes in Rush's repo. And if you had 50 subtrees, I would guess that operations like cloning/building/diffing should scale just fine. Whereas with submodules I would be concerned about performance problems because each submodule has to be cloned/diffed separately.
Thus tentatively maybe it is a good idea for Rush users to choose subtrees instead of submodules. If this makes sense based on other people's experience, we can update the Rush documentation to recommend this. (But with the caveat that human workflows are necessarily going to be more awkward when you have to coordinate PRs against multiple different repos.)
@octogonz Oh yes thanks a lot for your explanation! Seems like I need to dig into git subtree's internal to evaluate but in general it should be fine. We will report our experience after migrating to this solution.
submodule also break cache: (/docs is a submodule)
Error calculating the state of the repo. (inner error: Error: git hash-object exited with status 128: fatal: Unable to hash docs
). Continuing without diffing files.
Just migrated from lerna to rush. This is my only problem so far.
Just migrated from lerna to rush. This is my only problem so far.
We would like to support submodules, and would happily accept a community PR. However I suspect the official Rush maintainers are unlikely to implement it, because as far as I know, nobody is wanting submodules in any of our monorepos. Every time someone suggests it, it seems like they always end up preferring a different solution.
Is there some alternative tools that support git submodule ??
Is there some alternative tools that support git submodule ??
Looks like some support was implemented in https://github.com/microsoft/rushstack/pull/233
And @chengcyber has created PR https://github.com/microsoft/rushstack/pull/3749 that maybe will fix some other problems.
Thanks @chengcyber for fixing this in PR #3749!
🚀 Released with Rush 5.83.3