Turn stdarch into a subtree
stdarch is one of the few remaining big submodules in the rustc repo. And it's painful -- stdarch depends on a lot of very internal rustc implementation details, but refactoring / improving those details always requires a complicated multi-stage process to ensure stdarch keeps building. It would be so much easier if the two could be updated in lock-step, but with a submodule that's not very practical. I ran into this twice just recently: as part of https://github.com/rust-lang/rust/issues/128738, and as part of https://github.com/rust-lang/rust/pull/131349. To ensure our continued ability of maintaining and improving rustc, we should fix this situation.
Miri, clippy, portable-simd, RA are all using subtrees, using either git subtree or josh. git subtree has some bad interacts with git blame due to the way it mirrors the changes between the repo. josh has the downside that many of the merge commits from the rustc repo end up being mirrored into the subrepo (see https://github.com/josh-project/josh/issues/1328). I don't care very strongly which is used, though the general trend has been away from git subtree and towards josh (RA recently migrated, clippy decided to migrate but the effort seems currently stalled). If you pick josh, I can help with the migration. (I have some migration notes here but those are for a migration from git subtree to josh. Migrating from submodules will look a bit different.) You'll likely want some maintainer scripts in the stdarch repo to simplify pushing and pulling changes from and to the rustc repo; cargo xtask is probably the easiest way to set that up (that's what RA uses).
Last time I brought this up with @Amanieu, they were concerned about testing changes that land via rustc, since not the full stdarch test suite runs in rustc CI. I don't know how much testing in rustc CI is required to alleviate these concerns -- would be good to get a clear list of requirements. And then I hope someone can get motivated to work on this so that stdarch can stop being a roadblock for rustc evolution. :)
I think the main issue is writing the logic that triggers the stdarch CI bits only when something is changed on here.
git subtree seems available out of box, not sure how bad is the git blame issue or how difficult is to setup josh w/out docker so I cannot tell which is nicer to use. I noticed that git-subrepo seems available.
I think the main issue is writing the logic that triggers the stdarch CI bits only when something is changed on here.
Generally we don't do this in rust-lang CI since it is rather fragile -- it is easy to miss some file that can make tests fail, and then it can be tricky to debug when a failure actually got introduced. CI should be stateless -- it should ensure that the repo satisfies certain checks, in a repeatable way that does not depend on past history. Miri and clippy for example get tested on rustc CI each time, not just when their subtrees change.
how difficult is to setup josh w/out docker
I use josh without docker, cargo install is all it takes.
git subtree seems available out of box
AFAIK one has to use a patched version to make it usable with large repos like rustc, otherwise it is way too slow.
I noticed that git-subrepo seems available.
I have not tried this one. It seems to be squashing the subrepo history in the main repo into a single large commit for each pull, whereas git subtree and josh both preserve individual subrepo commits.
Cc @calebzulawski @flip1995 @lnicola (for portable-simd, clippy, RA) in case they have anything to add to the choice of tool here.
My only comment is that I often find myself deleting the branch and recreating it when something goes a little awry with git-subtree. I haven't tried josh yet but it seems possibly easier to use.
Then I guess josh is a winner :)
This is still causing regular pain during compiler work, e.g. when cleaning up intrinsics in https://github.com/rust-lang/rust/pull/136543.
https://github.com/rust-lang/rust/issues/132735 is also running into this.
Can we make some sort of concrete plan here, so that stdarch stops being a constant thorn in the side of compiler contributors?
I guess the main question pending is who's doing the conversion and secondarily who's going to write the documentation so contributors won't get lost :)
If it's going to be done with josh-proxy, I can help, but the maintainers need to be closely involved to ensure this works long-term. rustc-dev-guide was recently migrated from a submodule to a josh-subtree so that could be used as a template (and it points out some of the potential pitfalls). Cc @Kobzol
Sure, I'm willing to do the migration! I'll have to check stdarch's CI though, I remember it being a bit complex.
Do you think that it is necessary to run the full CI in rust-lang/rust, or could we get away with a subset that checks the most critical stuff, and then deal with the rest during pulls?
I believe the best plan for the long term is to fully migrate stdarch into rust-lang/rust and archive this repository. stdarch is only a separate repo for historical reason since it was originally an independent crate. Today stdarch is deeply tied to compiler and standard library internals so this no longer makes sense. Additionally it would help to run stdarch's testsuite on rustc changes since the tests frequently break on rustc changes, especially LLVM upgrades.
As such I think the best plan for a migration is:
- Make rust-lang/rust CI run the stdarch testsuite, while keeping it as a - submodule. This would involve running the tests for all architectures under a single runner since the total runtime roughly matches a single rust-lang/rust runner.
- Replace the submodule in rust-lang/rust with josh. Encourage stdarch development to happen on rust-lang/rust instead of stdarch.
- Once all PRs and issues have been merged and/or migrated, the stdarch repository can be archived.
That would also work, of course -- and I agree re: all the implementation details.
Is it worth doing the josh setup if in the end we'll want to retire the separate repo anyway?
I'm fully in favor of moving stdarch into rust-lang/rust!
Make rust-lang/rust CI run the stdarch testsuite, while keeping it as a - submodule. This would involve running the tests for all architectures under a single runner since the total runtime roughly matches a single rust-lang/rust runner.
That would require cross-compilation, I assume. Is stdarch configured in a way where this is easy to do?
I was more thinking about the opposite direction, i.e. run stdarch tests for architecture X in an existing rustc runner that already runs rustc (and other) tests for architecture X.
In theory it should just be a matter of executing all the run-docker.sh scripts in sequence on an appropriate runner: each docker container is set up for cross-compilation for the appropriate target and will run tests under qemu.
We have a handful of windows/mac targets which use plain run.sh without docker, those could be attached to existing runners.
Hmm, it's not that simple, I think. Linux jobs in r-l/r already run inside Docker, and tackling another Docker after tham would have a lot of complexity, I don't think we will want to do that. So we'll need to transplant the stdarch jobs into the rustc Docker containers somehow. Furthermore, the stdarch test suite will somehow need to be reimplemented within bootstrap, so it can be executed locally, and not just on CI. I'll take a look at how it could be done.
(An off-topic post got removed, not sure by whom)
I should really stop trying to clean up rustc intrinsics, I hit this problem every single time and it is so painful. :/ Just this month I hit it 3 times...
stdarch seriously needs to stop adding so much extra drag to rustc cleanups.
Migrating the test suite is... non-trivial. But we could turn stdarch into a (Josh) subtree anyway, and run at least some smoke tests on rust-lang/rust, in order to make these transitions easier. The full test suite would still run only in this repo.
Yeah, this could be paired with a cronjob that syncs and runs the full test suite against the latest rustc every morning. I guess the question is whether that's acceptable for @Amanieu.
Considering the amount of trouble this is causing, I'd be OK with just setting up a subtree now and figuring out the CI situation later.
Great, thanks :-)
Ok, since we're now also migrating compiler-builtins to a subtree, I think I'll revive the idea of creating some shared Josh sync scripts, and then I'll try to bootstrap the move into rust-lang/rust.
Happy to provide Josh advice where I can :)
@Kobzol For a shared Josh sync script, also have a look at https://github.com/rust-lang/rust-clippy/pull/12759. We'll be doing some things slightly differently there, as we pin a nightly in a rust-toolchain.toml file in our repo, rather than pinning a commit hash and using RTIM, as miri does IIRC.
How do you ensure that a rustc-push uses the exact same commit as base as the one that the most recent rustc-pull pulled from? I would strongly recommend ensuring this invariant, it helps keep josh on the well-behaved paths.
Link to our old Zulip thread where we talked about this 😅 (I recently looked that up to re-figure this out myself)
Basically bumping the nightly in rust-toolchain.toml file to today's version during the rustc-pull, then using rustc -vV to get the commit of the nightly. That commit will then be used for both pull and push.
The move to Josh itself is implemented in https://github.com/rust-lang/rust/pull/141899.
Just a small question, what will this entail for the contributors (sorry no experience with subtrees)? Will the PRs have to be sent to the rustc repo or this repo itself? And do we need to do something more when editing stdarch in a rustc PR?
Most changes should still be sent to this repository. Only when there is a change that also requires a corresponding rustc change (or if there is a rustc change that requires a stdarch change) should the PR be sent against rust-lang/rust.
And do we need to do something more when editing stdarch in a rustc PR?
I don't think so.
Ah ok thanks, so no more 3-PR chains just to add some target features 🎉.
And one more question, will the changes to this repo be automatically synced to the rustc subtree, or do we have to do it manually like now?