drake
drake copied to clipboard
macOS arm64 releases
Now that we have MOSEK 10 landed, we should figure out and execute the next steps for macOS arm64 support.
That will include items such as:
- Updating
tools/macos-arch-arm64.bazelrcto re-enable mosek. - Adding CI jobs for "everything" and/or "snopt-mosek", getting them passing, and then into Production.
- Adding CI jobs for "packaging", getting them passing, and then into Production.
- Adding wheels.
@svenevs and/or @BetsyMcPhail could investigate and then propose a plan of attack in the issue here, and then we start worked this on the board?
The primary objective will be shipping packaging (tgz) nightlies for native arm64 (with snopt + mosek).
The outline above seems good to me -- will begin by checking if reenabling in the existing experimental everything build works.
Adding CI jobs for "everything" and/or "snopt-mosek", getting them passing, and then into Production.
In adding snopt-mosek, do we want to remove the snopt job? mac-m1-monterey-unprovisioned-clang-bazel-nightly-snopt-release => nightly-snopt-mosek-release? Or do we seek to keep the snopt-only version too?
In general we should aim to be congruent with what the other platforms do, which I believe define both snopt and snopt-mosek. We might need a different schedule on M1 due to resource constraints, but we should at least have "experimental" jobs with the full matrix.
In general we should aim to be congruent with what the other platforms do, which I believe define both snopt and snopt-mosek.
We matched experimental to existing macOS, not linux. Linux has a number of jobs that macOS does not.
We might need a different schedule on M1 due to resource constraints, but we should at least have "experimental" jobs with the full matrix.
Agreed. The plan for nightly and continuous (when finalized) will get posted here in summary form with links to the corresponding drake-jenkins-jobs PRs.
3 experimental jobs were removed (provisioned-clang-bazel-experimental-snopt-release, unprovisioned-clang-bazel-experimental-snopt-release, unprovisioned-clang-bazel-experimental-everything-release). 11 experimental jobs were added:
- [x] mac-m1-monterey-provisioned-clang-bazel-experimental-everything-address-sanitizer: needs #17911.
- [x] mac-m1-monterey-provisioned-clang-bazel-experimental-address-sanitizer: needs #17911.
- [x] mac-m1-monterey-provisioned-clang-bazel-experimental-debug: needs #17911.
- [X] mac-m1-monterey-provisioned-clang-bazel-experimental-everything-debug: needs #17911, #17915 via #17998.
- [X] mac-m1-monterey-provisioned-clang-cmake-experimental-debug
- [X] mac-m1-monterey-provisioned-clang-cmake-experimental-everything-debug
- [X] mac-m1-monterey-provisioned-clang-cmake-experimental-everything-release
- [X] mac-m1-monterey-provisioned-clang-cmake-experimental-release
- [X] mac-m1-monterey-unprovisioned-clang-cmake-experimental-release
- [ ] mac-m1-monterey-unprovisioned-clang-wheel-experimental-snopt-mosek-release Wheel experimental builds failing for unknown reasons: #17906
- [x] mac-m1-monterey-unprovisioned-clang-bazel-experimental-snopt-mosek-packaging snopt-mosek-packaging builds but does not upload: #17909
I think the only thing left here is documenting the existence of releases, we had discussed potentially doing an ex-post-facto arm64 release for v1.8.0 but at this point since I'm about to try and update the orka xcode (may be a challenge, os update required but not sure if it persists with their virtualization). As such, maybe we hold off on advertising until after the next release / xcode update is complete.
Yeah, we're close enough to v1.9.0 in ~2 weeks out that I'm fine just waiting for that to ship stable arm binaries.
I'd like to discuss nightly first, and get an understanding for what the coverage desires are. Currently we run nightly (job (average full run time)):
mac-m1-monterey-provisioned-clang-bazel-nightly-everything-release(1 hour)mac-m1-monterey-provisioned-clang-bazel-nightly-release(2 hours)mac-m1-monterey-unprovisioned-clang-bazel-nightly-everything-release(3 hours)mac-m1-monterey-unprovisioned-clang-bazel-nightly-release(2 hours)mac-m1-monterey-unprovisioned-clang-bazel-nightly-snopt-mosek-packaging(42 minutes)
Note:
- No build cache server currently (investigations underway).
- Job concurrency maximum: 4
- Those nightlies primarily went in because the other flavors (
everything, anddebugin particular) were broken for some time.
For x86_64:
- 4 of the equivalent jobs as m1 above.
- m1 has
unprovisioned-clang-bazel-nightly-everything-release, x86_64 does not. - Additionally:
mac-big-sur-clang-bazel-nightly-address-sanitizermac-big-sur-clang-bazel-nightly-debugmac-big-sur-clang-bazel-nightly-everything-address-sanitizermac-big-sur-clang-bazel-nightly-everything-debugmac-big-sur-clang-cmake-nightly-debugmac-big-sur-clang-cmake-nightly-everything-debugmac-big-sur-clang-cmake-nightly-everything-releasemac-big-sur-clang-cmake-nightly-releasemac-big-sur-unprovisioned-clang-cmake-nightly-releasemac-big-sur-unprovisioned-clang-wheel-nightly-snopt-mosek-release(ignore for now, m1 wheel TODO)
Proposal:
- Drop:
unprovisioned-clang-bazel-nightly-everything-release(I don't see what value it adds)
- Keep:
clang-bazel-nightly-everything-releaseclang-bazel-nightly-releaseunprovisioned-clang-bazel-nightly-releaseunprovisioned-clang-bazel-nightly-snopt-mosek-packaging
- Add:
clang-bazel-nightly-everything-debug- address sanitizer? or everything-address-sanitizer?
- 2-4 cmake jobs (matrix: (provisioned, unprovisioned) x (release, debug))
- Which is more important? Checking CMake against release and debug? Or provisioned CMake vs unprovisioned?
- There are already release and debug bazel flavors.
- everything-release and everything-debug are the most important m1 jobs historically (solver issues and tolerances etc), but as long as that's hit by bazel does
cmake-everything-{release,debug}add any additional value? - Note: vSphere cmake jobs are fast because of caching, that will not be true here.
So to clarify, I think we should be able to safely try out 8 jobs nightly, it "should" play out as 4 run first and then 4 after and finish by 9amish. I need help understanding the priorities of "everything" via bazel vs cmake, and release / debug.
For continuous I think we should only launch three jobs (leaves one available for experimental users), and believe snopt-mosek-packaging should be in there. But I think the nightly discussion will help clarify priorities for continuous and weekly.
It's still on my TODO list to write up some thoughts about schedule.
In the meantime, please make sure there's already an Experimental job defined for anything being proposed for a regular schedule. Part of the input to schedule will be launching and Experimental to see what state we're currently in.
I'm still working on calculating a good schedule for new arm jobs. In the meantime, let me post some cleanups that will be necessary as part of this. (I'm fine either doing these concurrently with the arm additions, or as a prior first pass before adding the larger arm schedule.)
(1) Using m1 in the job name isn't durable -- it's a CPU name, not an architecture name.
We might as well fix that promptly now, since we're going to need to fix it eventually. My proposal is to use arm instead of m1 for all of these jobs.
(2) We have jobs like mac-m1-monterey-provisioned-clang-bazel-nightly-release with an unusual -provisioned- in their title.
That's inconsistent with the rest of CI where the "provisioned" is implied unless otherwise stated as -unprovisioned-. We should either drop -provisioned- from the M1 jobs, or add -provisioned- to all of the other jobs (linux included). The former is probably much easier.
(3) It seems worthwhile to be clear that all of the other Mac builds are not arm builds.
I'd propose renaming them to add -x86- to their job name. So for example mac-x86-monterey-clang-bazel-continuous-release instead of mac-monterey-clang-bazel-continuous-release.
(4) In the CI views tabs at https://drake-jenkins.csail.mit.edu/ we have an incomplete "Mac Monterey" tab.
It doesn't have all Mac Monterey jobs -- only the x86 ones. Either we should add the arm jobs to that tab, or we should rename the tab to "Mac x86 Monterey" (and add an arm tab twin).
(5) When we rename any CI jobs, we'll also need to update cross-references.
There's often a few places on the website (drake/doc) and maybe the release tooling (drake/tools) that would also need to be updated to match.
One thing to keep in mind is that when a job is renamed, all of the history will be lost. I'm not suggesting that should stop us but just something to be aware of.
In the meantime, please make sure there's already an Experimental job defined for anything being proposed for a regular schedule. Part of the input to schedule will be launching and Experimental to see what state we're currently in.
I confirmed that Big Sur, Monterey x86 and Monterey arm all have the same set of experimental jobs. All proposed nightly jobs (above) have an equivalent experimental job.
- In the past, there was a reason why
mac-m1jobs hadprovisionedadded to their name. No can remember exactly why that was though and it's not documented anywhere. Theprovisionedhas been removed (drake-jenkins-jobs #38). I launched mac-m1-monterey-clang-bazel-experimental-release and mac-monterey-clang-bazel-experimental-release to test that the correct version of the job (m1 vs x86) was launched. No cross-reference updates needed for this change.
- M1 (Arm) jobs have been added to the Monterey tabs (drake-jenkins-jobs #39)
- M1 jobs have been renamed to ARM (https://github.com/RobotLocomotion/drake-jenkins-jobs/pull/40). The release playbook has been updated to use the new names in #18142
- Non-ARM jobs have been renamed to mac-x86-.... (https://github.com/RobotLocomotion/drake-jenkins-jobs/pull/41). The documentation has been updated to use the new names in #18143
- Cross references were updated as part of 1. and 3.
All of the cleanup has been completed.
With #18189 landed we should probably create mac-arm64 wheel nightly (and maybe continuous) builds?
Here's my proposal for a plan:
macOS CI matrix plan
Package binary builds
- flavor: (2) Wheel, Packaging
- arch: (2) X86 and ARM
- osx_ver: (1) Minimum supported (currently Monterey)
- symbols: (1) Release
- solvers: (1) SNOPT+MOSEK
- trigger: (3) Staging and Nightly and Experimental
mac-arm-monterey-unprovisioned-clang-bazel-experimental-snopt-mosek-packaging mac-arm-monterey-unprovisioned-clang-bazel-nightly-snopt-mosek-packaging mac-arm-monterey-unprovisioned-clang-bazel-staging-snopt-mosek-packaging mac-arm-monterey-unprovisioned-clang-wheel-experimental-snopt-mosek-release mac-arm-monterey-unprovisioned-clang-wheel-nightly-snopt-mosek-release mac-arm-monterey-unprovisioned-clang-wheel-staging-snopt-mosek-release mac-x86-monterey-unprovisioned-clang-bazel-experimental-snopt-mosek-packaging mac-x86-monterey-unprovisioned-clang-bazel-nightly-snopt-mosek-packaging mac-x86-monterey-unprovisioned-clang-bazel-staging-snopt-mosek-packaging mac-x86-monterey-unprovisioned-clang-wheel-experimental-snopt-mosek-release mac-x86-monterey-unprovisioned-clang-wheel-nightly-snopt-mosek-release mac-x86-monterey-unprovisioned-clang-wheel-staging-snopt-mosek-release
CMake cached builds
- arch: (2) X86 and ARM
- osx_ver: (2) All (currently Monterey and Ventura)
- driver: (1) CMake
- symbols: (1) Release
- solvers: (2) None and Everything
- trigger: (2) Nightly and Experimental
mac-arm-monterey-clang-cmake-experimental-everything-release mac-arm-monterey-clang-cmake-experimental-release mac-arm-monterey-clang-cmake-nightly-everything-release mac-arm-monterey-clang-cmake-nightly-release mac-arm-ventura-clang-cmake-experimental-everything-release mac-arm-ventura-clang-cmake-experimental-release mac-arm-ventura-clang-cmake-nightly-everything-release mac-arm-ventura-clang-cmake-nightly-release mac-x86-monterey-clang-cmake-experimental-everything-release mac-x86-monterey-clang-cmake-experimental-release mac-x86-monterey-clang-cmake-nightly-everything-release mac-x86-monterey-clang-cmake-nightly-release mac-x86-ventura-clang-cmake-experimental-everything-release mac-x86-ventura-clang-cmake-experimental-release mac-x86-ventura-clang-cmake-nightly-everything-release mac-x86-ventura-clang-cmake-nightly-release
We also want to have an Unprovisioned Experimental available for any regularly scheduled build flavor, so we'll need these as well:
mac-arm-monterey-unprovisioned-clang-cmake-experimental-release mac-x86-monterey-unprovisioned-clang-cmake-experimental-release mac-arm-ventura-unprovisioned-clang-cmake-experimental-release mac-x86-ventura-unprovisioned-clang-cmake-experimental-release mac-arm-monterey-unprovisioned-clang-cmake-experimental-everything-release mac-x86-monterey-unprovisioned-clang-cmake-experimental-everything-release mac-arm-ventura-unprovisioned-clang-cmake-experimental-everything-release mac-x86-ventura-unprovisioned-clang-cmake-experimental-everything-release
Bazel boostrapping builds
- flavor: (1) Unprovisioned
- arch: (2) X86 and ARM
- osx_ver: (2) All (currently Monterey and Ventura)
- driver: (1) Bazel
- symbols: (1) Release
- solvers: (1) None
- trigger: (2) Nightly and Experimental
mac-arm-monterey-unprovisioned-clang-bazel-experimental-release mac-arm-monterey-unprovisioned-clang-bazel-nightly-release mac-arm-ventura-unprovisioned-clang-bazel-experimental-release mac-arm-ventura-unprovisioned-clang-bazel-nightly-release mac-x86-monterey-unprovisioned-clang-bazel-experimental-release mac-x86-monterey-unprovisioned-clang-bazel-nightly-release mac-x86-ventura-unprovisioned-clang-bazel-experimental-release mac-x86-ventura-unprovisioned-clang-bazel-nightly-release
Bazel cached builds
- arch: (2) X86 and ARM
- osx_ver: (2) All (currently Monterey and Ventura)
- driver: (1) Bazel
- symbols: (1) Release
- solvers: (2) None and Everything
- trigger: (2) (Continuous ^ Nightly) and Experimental †
† Continuous for solvers=None, Nightly for solvers=Everything.
mac-arm-monterey-clang-bazel-continuous-release mac-arm-monterey-clang-bazel-experimental-everything-release mac-arm-monterey-clang-bazel-experimental-release mac-arm-monterey-clang-bazel-nightly-everything-release mac-arm-ventura-clang-bazel-continuous-release mac-arm-ventura-clang-bazel-experimental-everything-release mac-arm-ventura-clang-bazel-experimental-release mac-arm-ventura-clang-bazel-nightly-everything-release mac-x86-monterey-clang-bazel-continuous-release mac-x86-monterey-clang-bazel-experimental-everything-release mac-x86-monterey-clang-bazel-experimental-release mac-x86-monterey-clang-bazel-nightly-everything-release mac-x86-ventura-clang-bazel-continuous-release mac-x86-ventura-clang-bazel-experimental-everything-release mac-x86-ventura-clang-bazel-experimental-release mac-x86-ventura-clang-bazel-nightly-everything-release
Bazel uncached builds
- arch: (2) X86 and ARM
- osx_ver: (1) Minimum supported (currently Monterey)
- driver: (1) Bazel
- symbols: (2) Debug and ASan
- solvers: (2) None and Everything
- trigger: (2) Nightly and Experimental
Caveat: We don't have enough ARM VMs, so we need to skip ARM/Everything: (DO NOT RUN) mac-arm-monterey-clang-bazel-nightly-everything-address-sanitizer (DO NOT RUN) mac-arm-monterey-clang-bazel-nightly-everything-debug (still available) mac-arm-monterey-clang-bazel-experimental-everything-address-sanitizer (still available) mac-arm-monterey-clang-bazel-experimental-everything-debug
mac-arm-monterey-clang-bazel-experimental-address-sanitizer mac-arm-monterey-clang-bazel-experimental-debug mac-arm-monterey-clang-bazel-nightly-address-sanitizer mac-arm-monterey-clang-bazel-nightly-debug mac-x86-monterey-clang-bazel-experimental-address-sanitizer mac-x86-monterey-clang-bazel-experimental-debug mac-x86-monterey-clang-bazel-experimental-everything-address-sanitizer mac-x86-monterey-clang-bazel-experimental-everything-debug mac-x86-monterey-clang-bazel-nightly-address-sanitizer mac-x86-monterey-clang-bazel-nightly-debug mac-x86-monterey-clang-bazel-nightly-everything-address-sanitizer mac-x86-monterey-clang-bazel-nightly-everything-debug
~~We also want to have an Unprovisioned Experimental available for any regularly scheduled build flavor, so we'll need these as well:~~
~~mac-arm-monterey-unprovisioned-clang-bazel-experimental-address-sanitizer~~ ~~mac-arm-monterey-unprovisioned-clang-bazel-experimental-debug~~ ~~mac-arm-monterey-unprovisioned-clang-bazel-experimental-everything-address-sanitizer~~ ~~mac-arm-monterey-unprovisioned-clang-bazel-experimental-everything-debug~~ ~~mac-x86-monterey-unprovisioned-clang-bazel-experimental-address-sanitizer~~ ~~mac-x86-monterey-unprovisioned-clang-bazel-experimental-debug~~
Discussion
In the above, the presumption is that anything not in Experimental should be in the Production tab(s). For newly-added build configurations though, they would presumably start outside of the Production tab(s) until we have them passing.
Continuous load
mac-arm-monterey-clang-bazel-continuous-release (~90 minutes) mac-arm-ventura-clang-bazel-continuous-release (~90 minutes) mac-x86-monterey-clang-bazel-continuous-release (10-70 minutes) mac-x86-ventura-clang-bazel-continuous-release (10-70 minutes)
For the moment while we don't have caching working on ARM yet, it would be OK to demote ARM Ventura to Nightly instead of Continuous.
Nightly load: X86
mac-x86-monterey-unprovisioned-clang-bazel-nightly-release (140-180 minutes) mac-x86-ventura-unprovisioned-clang-bazel-nightly-release (140-180 minutes)
mac-x86-monterey-unprovisioned-clang-bazel-nightly-snopt-mosek-packaging (70-120 minutes) mac-x86-monterey-unprovisioned-clang-wheel-nightly-snopt-mosek-release (70-90 minutes)
mac-x86-monterey-clang-cmake-nightly-release (5-30 minutes) mac-x86-ventura-clang-cmake-nightly-release (5-30 minutes) mac-x86-monterey-clang-cmake-nightly-everything-release (40-60 minutes) mac-x86-ventura-clang-cmake-nightly-everything-release (40-60 minutes)
mac-x86-monterey-clang-bazel-nightly-everything-release (100-120 minutes) mac-x86-ventura-clang-bazel-nightly-everything-release (100-120 minutes)
mac-x86-monterey-clang-bazel-nightly-debug (180-240 minutes) mac-x86-monterey-clang-bazel-nightly-everything-debug (240-300 minutes) mac-x86-monterey-clang-bazel-nightly-address-sanitizer (150-240 minutes) mac-x86-monterey-clang-bazel-nightly-everything-address-sanitizer (200-270 minutes)
TOTAL 1430-2080 minutes / 6 VMs = ~4-6 hours
Nightly load: ARM
mac-arm-monterey-unprovisioned-clang-bazel-nightly-release (~100 minutes) mac-arm-ventura-unprovisioned-clang-bazel-nightly-release (~100 minutes)
mac-arm-monterey-unprovisioned-clang-bazel-nightly-snopt-mosek-packaging (~40 minutes) mac-arm-monterey-unprovisioned-clang-wheel-nightly-snopt-mosek-release (~45 minutes)
mac-arm-monterey-clang-cmake-nightly-release (~30 minutes) mac-arm-ventura-clang-cmake-nightly-release (~30 minutes) mac-arm-monterey-clang-cmake-nightly-everything-release (~20 minutes) mac-arm-ventura-clang-cmake-nightly-everything-release (~20 minutes)
mac-arm-monterey-clang-bazel-nightly-everything-release (90-120 minutes) mac-arm-ventura-clang-bazel-nightly-everything-release (90-120 minutes)
mac-arm-monterey-clang-bazel-nightly-debug (~210 minutes) mac-arm-monterey-clang-bazel-nightly-address-sanitizer (~210 minutes)
TOTAL 985-1045 minutes / 4 VMs = ~4-5 hours
In #18088, mac-[x86]-big-sur-unprovisioned-clang-bazel-snopt-mosek-packaging was intentionally left in both Continuous and Nightly. In the proposed plan, the monterey equivalent has been removed from Continuous. Was this intentional?
Yeah, I was on the fence about that.
I don't mind losing the test coverage for that (waiting until Nightly to test-cover it should be OK).
I am a little sad to lose the "continuous packaging" tgz artifacts for macOS due to this change, but the ~1+ hour runtime seemed like too much for continuous, and anyway we don't have (and maybe can't affort) continuous Wheel packaging.
My thought is to get rid of it for now. We can add back later if we hate it.
I have some hope that once the cache servers are up and running everywhere, we can use the daytime headroom to trigger macOS during pre-merge, at least some of the time.
Some observations on CMake builds:
- We are losing all
cmake-debugjobs cmake-unprovisionedjobs are experimental only
(For the record, I'm glad you're double-checking me on this.)
We are losing all cmake-debug jobs.
That's right. They are not very fast and are extremely unlikely to show anything that either cmake-release or bazel-debug would not already catch.
cmake-unprovisioned jobs are experimental only
That's right. They are not very fast and are extremely unlikely to show anything that either cmake (provisioned) or bazel-unprovisioned would not already catch.
We still want them in experimental in case someone is worried that it would catch something in a unique situation.
One additional CMake build comment, we're losing the Mac x86 Continuous CMake build but replacing it with a nightly. This seems reasonable and can be promoted if needed.
From Bazel uncached builds, we currently do not provide unprovisioned-address-sanitizer builds for either Linux or Mac. I don't think there's much value in adding them for Mac now.
From Bazel boostrapping builds, compared to the current setup we are missing:
mac-arm-monterey-unprovisioned-clang-bazel-nightly-everything-release(this exists but there's no x86 equivalent)mac-arm-monterey-unprovisioned-clang-bazel-experimental-everything-release(currently does not exist but it should if we have the nightly version)
Above, @svenevs suggests dropping these builds, I have no objections to that.
From Bazel uncached builds, we currently do not have any Mac unprovisioned debug builds. Historically, Mac debug builds are quite large and resulted in the x86 provisioned images being larger than the unprovisioned images. Currently x86 Monterey unprovisioned images are 160Gi and provisioned are 200Gi (they are 195Gi in both cases for ARM). In order to add unprovisioned debug builds we would have to resize the x86 Monterey unprovisioned images. I don't think we would gain enough to make that worthwhile, I'm gong to skip those for now.
In the Jenkins jobs PR I also added Ventura bazel experimental debug and everything-debug jobs so that they are available for testing, if needed.
I think we should be able to close this now? We still have Ventura updates but they should get their own issue(s).