macOS upgrade causes some build breakages
We recently updated macOS to 15.3.1 and Xcode 16.1 for all macOS VMs on Bazel CI.
https://buildkite.com/bazel/rules-apple-darwin/builds/10138#01958280-cdc9-4e72-85bb-7399fcdcc056
(01:03:28) ERROR: /Users/buildkite/builds/bk-macos-arm64-nn86/bazel/rules-apple-darwin/test/starlark_tests/targets_under_test/ios/BUILD:129:16: AssetCatalogCompile test/starlark_tests/targets_under_test/ios/app-intermediates/bundle_library_ios.bundle/xcassets failed: (Exit 1): xctoolrunner failed: error executing AssetCatalogCompile command (from target //test/starlark_tests/targets_under_test/ios:app)
(cd /private/var/tmp/_bazel_buildkite/3ebd711cd99f106e0bfcf0a4dddc286c/execroot/_main && \
exec env - \
APPLE_SDK_PLATFORM=iPhoneSimulator \
APPLE_SDK_VERSION_OVERRIDE=18.2 \
PATH=/Users/buildkite/Library/Caches/bazelisk/downloads/sha256/ac72ad67f7a8c6b18bf605d8602425182b417de4369715bf89146daf62f7ae48/bin:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/homebrew/bin \
XCODE_VERSION_OVERRIDE=16.2.0.16C5032a \
bazel-out/darwin_arm64-opt-exec-ST-d57f47055a04/bin/tools/xctoolrunner/xctoolrunner actool --compile '[ABSOLUTE]bazel-out/ios_sim_arm64-fastbuild-ios-sim_arm64-min12.0-applebin_ios-ST-26d4f6b9029b/bin/test/starlark_tests/targets_under_test/ios/app-intermediates/bundle_library_ios.bundle/xcassets' --platform iphonesimulator --minimum-deployment-target 12.0 --compress-pngs --target-device iphone '[ABSOLUTE]test/starlark_tests/resources/assets.xcassets')
# Configuration: 8931e66e98dfc14da7a42f9369975f695461c8159360912aca94c6ca1a76bc52
# Execution platform: @@platforms//host:host
/Users/buildkite/builds/bk-macos-arm64-nn86/bazel/rules-apple-darwin/test/starlark_tests/resources/assets.xcassets: error: No simulator runtime version from [<DVTBuildVersion 21F79>, <DVTBuildVersion 22D8075>] available to use with iphonesimulator SDK version <DVTBuildVersion 22C146>
@fweikert @brentleyjones
The tests request Xcode 15.4, which is no longer installed on our MacOS workers. That's why CI bumps the version ("Fixed Xcode version: 15.4 -> 16.2..."), which might cause the problem. @brentleyjones is there something we need to do here, or is it a fix in rules_apple?
Even if using Xcode 16.2, we have the simulator runtime issue. @aaronsky has some experience with that and can give more context.
For the period we're using Xcode 16.2 on the CI nodes, xcodebuild -downloadPlatform iOS will install the iOS 18.3.1 simulator runtime by default, rather than the originally distributed iOS 18.2. The rules expect by default that the SDK version (18.2) match the simulator runtime version (now 18.3, represented sometimes as 18.3.1). The best thing to do in the short term would be to add a line like this somewhere in the VM/image setup:
# ...
# somewhere after running `xcodebuild -runFirstLaunch` and `xcodebuild -downloadPlatform iOS`
xcode_short_version="$(xcodebuild -version | head -n1 | cut -d' ' -f2 | cut -d. -f1,2)"
if [ "$xcode_short_version" = "16.2" ]; then
xcodebuild -downloadPlatform iOS -buildVersion 18.2
fi
# ...
# xcodebuild -checkFirstLaunchStatus
This was the best workaround I could find in my own environment to make Bazel, simulator_creator.py, and xcodebuild happy. I wouldn't recommend running this on every CI job, as -downloadPlatform is very slow and needs some time to finish mounting the simulator runtime disk image after it's been installed.
@aaronsky Where is this code? Can we fix the code to not make the assumption?
The code where this all breaks down is in these spots:
- https://github.com/bazelbuild/rules_apple/blob/master/apple/testing/default_runner/ios_xctestrun_runner.bzl#L53
- https://github.com/bazelbuild/rules_apple/blob/master/apple/testing/default_runner/simulator_creator.py#L85
- (and to a lesser extent) https://github.com/bazelbuild/rules_apple/blob/master/apple/testing/default_runner/ios_test_runner.bzl#L45
When I examined this a couple weeks ago I couldn't figure out another sensible default that could be used to replace the assumption without requiring a new field or some new functionality in xcode_version. Making os_version required on the test runner would break building across different Xcode versions.
I'm mostly worried that the next time we upgrade macOS this will happen again, is there a long term solution for this you have in mind?
That's a reasonable concern, and no, I don't have a long-term plan in the event Apple drops a new runtime on us out of the blue again. I agree it needs a more robust solution beyond this workaround.
We added
xcode_short_version="$(xcodebuild -version | head -n1 | cut -d' ' -f2 | cut -d. -f1,2)"
if [ "$xcode_short_version" = "16.2" ]; then
xcodebuild -downloadPlatform iOS -buildVersion 18.2
fi
to our setup script and updated the VMs, can you please verify if it works now?
Looks like rules_apple is still red: https://buildkite.com/bazel/rules-apple-darwin/builds/10157
@aaronsky Can you try to fix this from rules_apple side? I don't know what else we could do on the infra side and building and deploying a new VIM image is not very trivial.
There seems to still be a mismatch related to the installed simulator runtimes, but this time with the visionOS runtime. Which, as far as I'm aware, haven't received an update recently. I don't recognize runtime 22N895, but 21O5565d is the SDK that shipped alongside Xcode 15.4, and 22N799 is the SDK in Xcode 16.2.
@aaronsky I believe this has to be fixed from rules_apple side, so I'm closing this now.
@meteorcloudy while I work on this from the rules_apple side, there is at least one other thing I need done on the macOS image (if you wouldn't mind reopening this issue). It appears as though the xros1.2 simulator runtime is still installed on the image, and it's confusing Xcode. Can we please see about removing the xros1.2 runtime and keeping xros2.2?
Alternatively, we could use xcrun simctl runtime match set to forcefully map SDKs to simulator runtimes. But neither of these two things are things we can do from rules_apple on bazelci.
The shell_commands in https://github.com/bazelbuild/rules_apple/pull/2677 (sorry for misusing them) shows that the installed/configured Xcode 16.2 is definitely confused about how it's selecting the underlying sim runtime. The command ran:
APPLE_SDK_PLATFORM=XRSimulator APPLE_SDK_VERSION_OVERRIDE=2.2 xcrun actool --compile 'doc' --platform xrsimulator --minimum-deployment-target 1.0 --compress-pngs --target-device vision 'examples/resources/VisionAppIcon.xcassets'
The output (matching what rules_apple tests are showing):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.actool.compilation-results</key>
<dict>
<key>output-files</key>
<array/>
</dict>
<key>com.apple.actool.errors</key>
<array>
<dict>
<key>description</key>
<string>No simulator runtime version from [<DVTBuildVersion 21O5565d>, <DVTBuildVersion 22N895>] available to use with xrsimulator SDK version <DVTBuildVersion 22N799></string>
</dict>
</array>
</dict>
</plist>
This should just use xrOS 2.3 (22N895). This command works for me locally with just 22N895 present in xcrun simctl runtime list.
I think this should be reopened and a simple actool command like that should be confirmed to succeed before it's considered closed.
Anything we can do to help get this fixed on the macOS runners? rules_apple is exploring fairly significant workarounds but correcting Xcode/the machines would be drastically better for the maintainers.
@fweikert Can you take a look?
I upgraded my VM to 15.4, and the command succeeds even though xrOS 1.2 is present:
== Disk Images ==
-- iOS --
iOS 18.3.1 (22D8075) - 819CE51A-FB18-412A-B149-654606CA3742 (Ready)
iOS 18.2 (22C150) - E22DA566-DE6E-440A-B6AA-C04F42E9A284 (Ready)
iOS 17.5 (21F79) - 2AED0805-8F1F-419F-9E02-5B38734BDD31 (Ready)
-- tvOS --
tvOS 18.2 (22K154) - A7AED462-E128-4A8C-9216-ABC2F4430ADF (Ready)
tvOS 17.5 (21L569) - CA2800E5-0E8B-41F4-BE18-EFCBB2B0509F (Ready)
-- watchOS --
watchOS 11.2 (22S99) - D0272B69-66EF-43E7-9421-160EFF5F6D59 (Ready)
watchOS 10.5 (21T575) - 8A918DA9-96C6-40F5-A2CC-3E46144AE937 (Ready)
-- xrOS --
xrOS 1.2 (21O5565d) - 0D8411AF-B9C3-4BB7-AA33-822A029A5A36 (Ready)
xrOS 2.3 (22N895) - 0B4E0E33-96F0-458B-871F-A6A3BEB6A559 (Ready)
Total Disk Images: 9 (53.1G)
$ APPLE_SDK_PLATFORM=XRSimulator APPLE_SDK_VERSION_OVERRIDE=2.2 xcrun actool --compile 'doc' --platform xrsimulator --minimum-deployment-target 1.0 --compress-pngs --target-device vision 'examples/resources/VisionAppIcon.xcassets'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.actool.compilation-results</key>
<dict>
<key>output-files</key>
<array>
<string>/Users/ci/fwe_test/rules_apple/doc/Assets.car</string>
</array>
</dict>
</dict>
</plist>
Nevertheless, I removed the old runtime via xcrun simctl runtime delete 21O5565d. I'll test the new image in QA soon.
Amazing, thank you, @fweikert! I don't think you necessarily need to remove 21O5565d but rather clear out any Xcode internal references to 22N799. But whatever is easiest for you all will be great, thanks.
Even with only 2.3 it's still failing. Where does the reference to 2.2 (22N799) come from?
My guess is that it's some internal state within Xcode. Perhaps retained from a previous version of Xcode before upgrading to the one associated with 22N895. I haven't tried to narrow it down much though. Hard to do without poking around on the VMs.
But that's what I was trying to convey, it's not the presence of the two in the list that are the problem. It's that Xcode thinks it can access something that doesn't actually exist and therefore the request SDK version maps to something that doesn't exist.
I'm not that familiar with Xcode, so I'll do some more digging.
It's already interesting that system_profiler -json SPDeveloperToolsDataType shows visionOS 2.2, whereas xcrun simctl runtime list only shows 2.3
I have no idea whats going on. Xcode > Components shows "visionOS 2.2 (22N799) SDK + visionOS 2.3 (22N85) SImulator", and I have no option to install anything else.
I'm still looking at this off and on. at this point using updating to 16.3 might sidestep some pieces ¯\(ツ)/¯
are these vms heavily resource constrained? these actions should be sub-second:
(01:21:45) [12,677 / 17,352] 6 actions running
--
| AssetCatalogCompile .../ios/static_framework_with_transitive_resources-intermediates/xcassets; 255s local, remote-cache
| //test/starlark_tests/targets_under_test/ios:static_framework_with_transitive_resources; 250s local, remote-cache
| AssetCatalogCompile test/starlark_tests/targets_under_test/visionos/app-intermediates/xcassets; 247s local, remote-cache
| ProcessEntitlementsFiles .../targets_under_test/watchos/single_target_app_entitlements.entitlements; 192s remote-cache, darwin-sandbox
| ProcessEntitlementsFiles .../watchos/ios_watchos_with_watchos_extension_entitlements.entitlements; 166s remote-cache, darwin-sandbox
I'm seeing this pretty regularly trying to debug this issue
This also happens for actions that aren't part of this repo:
[230 / 230] no actions running
| Fetching repository @@bazel_tools+xcode_configure_extension+local_config_xcode; Building xcode-locator 293s
| Fetching repository @@apple_support++apple_cc_configure_extension+local_config_apple_cc; starting 137s
|
which likely affects other project's CI too
Here's a set of jobs I cancelled while these hung https://buildkite.com/bazel/rules-apple-darwin/builds/10260#01964606-787b-4128-81bd-4dfef3218688
it's possible there's a GUI prompt related to xcode that's just hanging forever
here's a job that spent >3 minutes cloning the repo https://buildkite.com/bazel/rules-apple-darwin/builds/10261#01964613-6d3e-423b-87b6-fbc49dedbb64
seems like something fishy is going on since this repo is very small
I applied some workarounds here https://github.com/bazelbuild/rules_apple/pull/2679/, notably for this thread i had to delete simulators and setup the visionOS simulator manually. I think this is fine and I can drop it when Xcode is upgraded again.
I'm still interested in the performance aspects mentioned above ^
for a data point on perf, here's a scheduled job that ran all the tests (no caching) before this update in 25 minutes https://buildkite.com/bazel/rules-apple-darwin/builds/10108#01955d8b-1375-4b44-bc54-731e6d05524f
my green build before merging these workarounds took >1 hour https://buildkite.com/bazel/rules-apple-darwin/builds/10292#_
my workarounds include limiting --jobs and related flags so it's not entirely fair, but I found not doing that to just time out instead
@meteorcloudy @fweikert Any update on this? rules_apple CI is unbearable now, and tests regularly time out.
The Mac machines are very resource constraint, so running multiple large integration tests in parallel will likely cause tests to be flaky or timeout. We had to limit local test job number for bazel to 2: https://github.com/bazelbuild/bazel/blob/ba6f6f7ca8c9e377afdf22d05a8860d2b1adbc20/.bazelci/presubmit.yml#L207-L208