ponyc
ponyc copied to clipboard
Add linux musl tests on Arm
I notice that the tests didn't pass for musl.
Yup. We appear to have a problem when running on musl.
Setting up alpine on a Pi to test this looks pretty irritating. My first try might be to see if docker ran run on Raspbian.
rebased against main
. still need to investigate the cause of the musl failures.
It looks like it is the test harness that is segfaulting. Interesting.
The image used for this is based on Alpine 3.12 and needs to be updated to be based on 3.16.
@ergl could you try running this in docker on an arm machine to debug what is going on? it appears the runner crashes, i think, but i don't have any way to debug this easily.
@SeanTAllen Sure, I can try it tomorrow when I have some time off.
@SeanTAllen I managed to reproduce the segfault but unfortunately ran out of time to debug the problem. To trigger a segfault, it is enough to run the runner:
# ./build/build_debug/test/libponyc-run/runner/runner -h
Segmentation fault
I didn't have lldb
installed on the Docker image, so I was out of luck. But that would be the next step.
Thanks @ergl. Will you be doing the next step?
@SeanTAllen I can, but not until next week.
Here's the backtrace I get from the runner:
* thread #3, name = 'runner', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
* frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
frame #8: 0x0000aaaaaab0e1b4 runner`pony_error at posix_except.c:37:3
frame #9: 0x0000aaaaaaae05d0 runner`___lldb_unnamed_symbol2258 + 1016
frame #10: 0x0000aaaaaaade728 runner`___lldb_unnamed_symbol2250 + 840
frame #11: 0x0000aaaaaaadfc20 runner`___lldb_unnamed_symbol2253 + 856
frame #12: 0x0000aaaaaaae4c70 runner`___lldb_unnamed_symbol2286 + 2128
frame #13: 0x0000aaaaaaad4f00 runner`Main_Dispatch + 92
frame #14: 0x0000aaaaaab03e90 runner`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
frame #15: 0x0000aaaaaab035d0 runner`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
frame #16: 0x0000aaaaaab1383c runner`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
frame #17: 0x0000aaaaaab12d14 runner`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
frame #18: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
frame #19: 0x0000fffff7fad418 ld-musl-aarch64.so.1
frame #20: 0x0000aaaaaab1c654 runner`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
frame #21: 0x0000aaaaaab12c3c runner`ponyint_sched_start(library=false) at scheduler.c:1210:9
frame #22: 0x0000aaaaaab15008 runner`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc18) at start.c:332:7
frame #23: 0x0000aaaaaab033c0 runner`main + 240
frame #24: 0x0000fffff7f73274 ld-musl-aarch64.so.1
It seems like the source of the failure is here: pony_error at posix_except.c:37:3
which points to this:
https://github.com/ponylang/ponyc/blob/f6e1b60cab21a8da21ca414e75595a8b497bdb5c/src/libponyrt/lang/posix_except.c#L37
So it seems like we're doing something wrong when it comes to exception unwinding. This can be verified with this minimal Pony program that also causes the segfault:
actor Main
new create(env: Env) =>
try
error
end
The bactrace for the above program:
* thread #3, name = 'crash_example', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
* frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
frame #8: 0x0000aaaaaaab59c0 crash_example`pony_error at posix_except.c:37:3
frame #9: 0x0000aaaaaaaaa4f0 crash_example`Main_Dispatch + 56
frame #10: 0x0000aaaaaaaac5e4 crash_example`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
frame #11: 0x0000aaaaaaaabd24 crash_example`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
frame #12: 0x0000aaaaaaabaa6c crash_example`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
frame #13: 0x0000aaaaaaab9f44 crash_example`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
frame #14: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
frame #15: 0x0000fffff7fad418 ld-musl-aarch64.so.1
frame #16: 0x0000aaaaaaac450c crash_example`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
frame #17: 0x0000aaaaaaab9e6c crash_example`ponyint_sched_start(library=false) at scheduler.c:1210:9
frame #18: 0x0000aaaaaaabc238 crash_example`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc68) at start.c:332:7
frame #19: 0x0000aaaaaaaabbc0 crash_example`main + 240
frame #20: 0x0000fffff7f73274 ld-musl-aarch64.so.1
Edit: this was tested against the latest commit of this branch (1e82be3cfb784eef74311202a81838999be8497d)
Given this only happens on Arm musl, I don't feel confident saying that we are doing something wrong.
Closing as this requires cirrus CI that we are moving away from