[Flake] kokoro macOS aborting VM command due to timeout
https://source.cloud.google.com/results/invocations/adea03ee-63c1-4106-8902-fc928efba51d
2022-07-27T03:59:21Z (+574s)
--------------------------------
| Compiling using all CPUs |
--------------------------------
T+0.065s [1/2480] Building CXX object google/cloud/CMakeFiles/rest_internal_internal_unified_rest_credentials_test.dir/internal/unified_rest_credentials_test.cc.o
:
T+166.771s [706/2480] Linking CXX executable google/cloud/storage/examples/storage_public_object_samples
T+167.149s [707/2480] Linking CXX executable google/cloud/storage/examples/storage_lifecycle_management_samples
ERROR: Aborting VM command due to timeout of 7200 seconds
[ID: 3480291] Command finished after 7215 secs, exit value: 1
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
[22:50:02] Collecting build artifacts from build VM
Build script failed with exit code: 1
So, it looks like that last command started at about T+574 + 167 == T+741, and then we see nothing until the 7200s timeout. :-(
Note that "2022-07-26 22:50:02 -07:00" is "2022-07-27 05:50:02 +00:00", so indeed that is almost 2h after the "Compiling using all CPUs" message.
Maybe we should reopen #3645? I am not sure: I like that this bug is more specific.
Maybe we should reopen #3645?
My searching certainly failed me there (despite my memory telling me otherwise).
As for the question, now that there is a link between this and that I think we're probably fine, but I happy to move this over too.
Similar enough:
https://source.cloud.google.com/results/invocations/7e518dce-6134-4fb9-97a3-f32a9e289868
https://source.cloud.google.com/results/invocations/d203e043-db50-4784-b765-ff99b3cd494b
https://source.cloud.google.com/results/invocations/c739a5d9-2132-4ee1-a757-f9c7d1090d75
I think we have abused this bug. The original report was for a build that stopped reporting progress until it timed out. The other reports seem to be for build that continue making progress and still time out. I am going to ignore this difference for now.
From #10272:
For example:
https://source.cloud.google.com/results/invocations/7feaaa95-8bb3-4147-af74-d47e0e941b7d
The code is so large that any changes touching google_cloud_cpp_common seem to take more than 7200 seconds. I think we just need to increase the timeout value.
Note that by the time the CI build runs the Bazel cache is warm. This will not come up in our regular ways to detect flakes.