multipass icon indicating copy to clipboard operation
multipass copied to clipboard

[ci] Enable coredump upload for macOS

Open xmkg opened this issue 4 months ago • 6 comments

Mostly CI improvements as a result of 2-day troubleshooting macos-15 crash session:

  • Enable coredump generation for multipass_tests by codesigning it with a proper entitlement
  • Enable coredump generation at kernel level by setting kern.coredump and kern.corefile
  • Upload core dump + test executable to GitHub Artifacts on segv
  • Enable tmate session when test execution fails with segv
  • Add protobuf version validation macro to the tests/main.cpp
  • ~~Get rid of the generate_grpc_cpp, use protobuf_generate function~~
  • ~~Pin macos-15 xcode version to 16.0~~

Signed-off-by: Mustafa Kemal Gilor [email protected]

xmkg avatar Sep 05 '25 11:09 xmkg

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 89.07%. Comparing base (525fa4c) to head (4bc59ac). :warning: Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4343   +/-   ##
=======================================
  Coverage   89.07%   89.07%           
=======================================
  Files         240      240           
  Lines       15263    15263           
=======================================
  Hits        13595    13595           
  Misses       1668     1668           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Sep 05 '25 11:09 codecov[bot]

It turns out that our CI crashes were caused by a silent AppleClang update in the macOS-15 runner image. The official docs say that AppleClang 16 is still the default, but our cmake-configure tells otherwise:

-- The C compiler identification is AppleClang 17.0.0.17000013
-- The CXX compiler identification is AppleClang 17.0.0.17000013

Some of the tests in the test_cli_client.cpp have started segfaulting with no apparent reason. The backtrace looks as follows:

thread #1, stop reason = ESR_EC_IABORT_EL0 (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00000001015d82b4 multipass_tests`char const* google::protobuf::internal::TcParser::MpMap<false>(google::protobuf::MessageLite*, char const*, google::protobuf::internal::Parse
Context*, google::protobuf::internal::TcFieldData, google::protobuf::internal::TcParseTableBase const*, unsigned long long) + 388
    frame #2: 0x00000001015fdb70 multipass_tests`bool google::protobuf::internal::MergeFromImpl<false>(google::protobuf::io::ZeroCopyInputStream*, google::protobuf::MessageLite*, google::pro
tobuf::internal::TcParseTableBase const*, google::protobuf::MessageLite::ParseFlags) + 180
    frame #3: 0x00000001006f8128 multipass_tests`grpc::Status grpc::GenericDeserialize<grpc::ProtoBufferReader, multipass::SSHInfoRequest>(buffer=0x000000016f96fd78, msg=0x000000016f96ffc0) 
at proto_utils.h:87:15 [opt]
    frame #4: 0x00000001006f7ee0 multipass_tests`grpc::internal::CallOpRecvMessage<multipass::SSHInfoRequest>::FinishOp(bool*) [inlined] grpc::SerializationTraits<multipass::SSHInfoRequest, 
void>::Deserialize(buffer=0x000000016f96fd78, msg=<unavailable>) at proto_utils.h:113:12 [opt]
    frame #5: 0x00000001006f7ec8 multipass_tests`grpc::internal::CallOpRecvMessage<multipass::SSHInfoRequest>::FinishOp(this=0x000000016f96fd68, status=0x000000016f96fd1f) at call_op_set.h:4
50:13 [opt]
    frame #6: 0x0000000101050504 multipass_tests`grpc::internal::CallOpSet<grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<multipass::LaunchReply>, grpc::interna
l::CallNoOp<3>, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6>>::FinalizeResult(this=0x000000016f96fd50, tag=0x000000016f96fd10, status=0x000000016f96f
d1f) at call_op_set.h:923:16 [opt]
    frame #7: 0x00000001006f6f4c multipass_tests`grpc::CompletionQueue::Pluck(this=0x0000600001fb9fd0, tag=0x000000016f96fd50) at completion_queue.h:326:16 [opt]
    frame #8: 0x000000010106ced8 multipass_tests`grpc::ClientReaderWriter<multipass::SSHInfoRequest, multipass::SSHInfoReply>::Read(this=0x0000600001fb9fb0, msg=<unavailable>) at sync_stream
.h:486:16 [opt]
    frame #9: 0x0000000100f3c7f0 multipass_tests`multipass::cmd::Transfer::run(multipass::ArgParser*) [inlined] multipass::ReturnCode multipass::cmd::Command::dispatch<std::__1::unique_ptr<g
rpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>, std::__1::default_delete<grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfo
Reply>>> (multipass::Rpc::StubInterface::*&)(grpc::ClientContext*), multipass::SSHInfoRequest, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_0&, multipass::cmd::Transfer::run(multi
pass::ArgParser*)::$_1&, multipass::ReturnCode multipass::cmd::Command::dispatch<std::__1::unique_ptr<grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>, s
td::__1::default_delete<grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>>> (multipass::Rpc::StubInterface::*)(grpc::ClientContext*), multipass::SSHInfoRe
quest, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_0&, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_1&>(std::__1::unique_ptr<grpc::ClientReaderWriterInterface<multipas
s::SSHInfoRequest, multipass::SSHInfoReply>, std::__1::default_delete<grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>>> (multipass::Rpc::StubInterface::
*&&)(grpc::ClientContext*), multipass::SSHInfoRequest const&, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_0&, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_1&)::'lambda
'(multipass::SSHInfoReply&, grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>*)>(this=0x00006000013b88f0, rpc_func=<unavailable>, request=<unavailable>, o
n_success=<unavailable>, on_failure=<unavailable>, streaming_callback=<unavailable>) at command.h:92:24 [opt]
    frame #10: 0x0000000100f3c7ac multipass_tests`multipass::cmd::Transfer::run(multipass::ArgParser*) [inlined] multipass::ReturnCode multipass::cmd::Command::dispatch<std::__1::unique_ptr<
grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInfoReply>, std::__1::default_delete<grpc::ClientReaderWriterInterface<multipass::SSHInfoRequest, multipass::SSHInf
oReply>>> (multipass::Rpc::StubInterface::*)(grpc::ClientContext*), multipass::SSHInfoRequest, multipass::cmd::Transfer::run(multipass::ArgParser*)::$_0&, multipass::cmd::Transfer::run(multi
pass::ArgParser*)::$_1&>(this=0x00006000013b88f0, rpc_func=<unavailable>, request=<unavailable>, on_success=<unavailable>, on_failure=<unavailable>) at command.h:150:16 [opt]
    frame #11: 0x0000000100f3c7ac multipass_tests`multipass::cmd::Transfer::run(this=0x00006000013b88f0, parser=<unavailable>) at transfer.cpp:106:12 [opt]
    frame #12: 0x0000000100e5310c multipass_tests`multipass::Client::run(this=0x000000016f970730, arguments=<unavailable>) at client.cpp:159:75 [opt]
    frame #13: 0x00000001006c8d90 multipass_tests`(anonymous namespace)::Client::setup_client_and_run(this=<unavailable>, command=<unavailable>, term=<unavailable>) at test_cli_client.cpp:23
4:23 [opt]
    frame #14: 0x00000001006d2458 multipass_tests`(anonymous namespace)::Client_transferCmdInstanceSourceLocalTarget_Test::TestBody() [inlined] (anonymous namespace)::Client::send_command(th
is=0x000000011e83ba00, command=size=3, cout=<unavailable>, cerr=<unavailable>, cin=<unavailable>) at test_cli_client.cpp:243:16 [opt]
    frame #15: 0x00000001006d2428 multipass_tests`(anonymous namespace)::Client_transferCmdInstanceSourceLocalTarget_Test::TestBody(this=0x000000011e83ba00) at test_cli_client.cpp:640:5 [opt
]
    frame #16: 0x0000000100f9147c multipass_tests`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [in
lined] void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(object=<unavailable>, method=0x00000000000000010000000000000020, location="the test body") at gtest
.cc:2671:10 [opt]
    frame #17: 0x0000000100f9146c multipass_tests`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(object=0x000000011e83ba00, method=0x00000000000000000000000
000000020, location="the test body") at gtest.cc:2707:14 [opt]
    frame #18: 0x0000000100f91358 multipass_tests`testing::Test::Run(this=0x000000011e83ba00) at gtest.cc:2746:5 [opt]
    frame #19: 0x0000000100f926c4 multipass_tests`testing::TestInfo::Run(this=0x000000011df11780) at gtest.cc:2892:11 [opt]
    frame #20: 0x0000000100f9374c multipass_tests`testing::TestSuite::Run(this=0x000000011df11440) at gtest.cc:3070:30 [opt]
    frame #21: 0x0000000100fa3d38 multipass_tests`testing::internal::UnitTestImpl::RunAllTests(this=0x000000011df06010) at gtest.cc:6062:44 [opt]
    frame #22: 0x0000000100fa3458 multipass_tests`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (t
esting::internal::UnitTestImpl::*)(), char const*) [inlined] bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=<unavailable>, metho
d=(multipass_tests`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5917), location="auxiliary test code (environments or event listeners)") at gtest.cc:2671:10 [opt]
    frame #23: 0x0000000100fa3448 multipass_tests`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x000000011df06010, method=(multip
ass_tests`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5917), location="auxiliary test code (environments or event listeners)") at gtest.cc:2707:14 [opt]
    frame #24: 0x0000000100fa33b0 multipass_tests`testing::UnitTest::Run(this=0x0000000101eeea38) at gtest.cc:5602:10 [opt]
    frame #25: 0x00000001004ef680 multipass_tests`main [inlined] RUN_ALL_TESTS() at gtest.h:2337:73 [opt]
    frame #26: 0x00000001004ef678 multipass_tests`main(argc=1, argv=0x000000016f9715c0) at main.cpp:35:12 [opt]
    frame #27: 0x00000001858d2b98 dyld`start + 6076

After 2 days of whack-a-mole, I've figured out that it works fine in AppleClang 16, but not with AppleClang 17. The symptoms, version, and clang factor suggest that we might be suffering the following protobuf bug:

https://github.com/protocolbuffers/protobuf/issues/21447

xmkg avatar Sep 08 '25 17:09 xmkg

~~I'll likely split this PR into separate ones.~~

xmkg avatar Sep 10 '25 19:09 xmkg

@ricab, it looks like we're going to need to land this to troubleshoot https://github.com/canonical/multipass/actions/runs/18710495978/job/53357608706?pr=4455. Would you mind reviewing it if you have the time?

xmkg avatar Oct 22 '25 12:10 xmkg

@ricab, it looks like we're going to need to land this to troubleshoot canonical/multipass/actions/runs/18710495978/job/53357608706?pr=4455. Would you mind reviewing it if you have the time?

... and here too: https://github.com/canonical/multipass/actions/runs/18709979670/job/53355922058?pr=4463 EDIT: Another one: https://github.com/canonical/multipass/actions/runs/18717269864/job/53379892683?pr=4464

xmkg avatar Oct 22 '25 13:10 xmkg

Hey @xmkg, do you mean a proper review of the code or just confirmation from my side? I haven't been following closely, but I am totally fine with it in principle FWIW. I can squeeze it in if you really are looking for a third reviewer.

ricab avatar Oct 22 '25 14:10 ricab