ponyc icon indicating copy to clipboard operation
ponyc copied to clipboard

Add linux musl tests on Arm

Open SeanTAllen opened this issue 2 years ago • 13 comments

SeanTAllen avatar Oct 05 '21 02:10 SeanTAllen

I notice that the tests didn't pass for musl.

jemc avatar Oct 05 '21 18:10 jemc

Yup. We appear to have a problem when running on musl.

SeanTAllen avatar Oct 05 '21 18:10 SeanTAllen

Setting up alpine on a Pi to test this looks pretty irritating. My first try might be to see if docker ran run on Raspbian.

SeanTAllen avatar Oct 06 '21 01:10 SeanTAllen

rebased against main. still need to investigate the cause of the musl failures.

SeanTAllen avatar Oct 22 '21 18:10 SeanTAllen

It looks like it is the test harness that is segfaulting. Interesting.

SeanTAllen avatar Jan 30 '22 20:01 SeanTAllen

The image used for this is based on Alpine 3.12 and needs to be updated to be based on 3.16.

SeanTAllen avatar Jun 11 '22 12:06 SeanTAllen

@ergl could you try running this in docker on an arm machine to debug what is going on? it appears the runner crashes, i think, but i don't have any way to debug this easily.

SeanTAllen avatar Jul 08 '22 20:07 SeanTAllen

@SeanTAllen Sure, I can try it tomorrow when I have some time off.

ergl avatar Jul 09 '22 07:07 ergl

@SeanTAllen I managed to reproduce the segfault but unfortunately ran out of time to debug the problem. To trigger a segfault, it is enough to run the runner:

# ./build/build_debug/test/libponyc-run/runner/runner -h
Segmentation fault

I didn't have lldb installed on the Docker image, so I was out of luck. But that would be the next step.

ergl avatar Jul 10 '22 22:07 ergl

Thanks @ergl. Will you be doing the next step?

SeanTAllen avatar Jul 10 '22 23:07 SeanTAllen

@SeanTAllen I can, but not until next week.

ergl avatar Jul 11 '22 05:07 ergl

Here's the backtrace I get from the runner:

* thread #3, name = 'runner', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
  * frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
    frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
    frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
    frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
    frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
    frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
    frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
    frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
    frame #8: 0x0000aaaaaab0e1b4 runner`pony_error at posix_except.c:37:3
    frame #9: 0x0000aaaaaaae05d0 runner`___lldb_unnamed_symbol2258 + 1016
    frame #10: 0x0000aaaaaaade728 runner`___lldb_unnamed_symbol2250 + 840
    frame #11: 0x0000aaaaaaadfc20 runner`___lldb_unnamed_symbol2253 + 856
    frame #12: 0x0000aaaaaaae4c70 runner`___lldb_unnamed_symbol2286 + 2128
    frame #13: 0x0000aaaaaaad4f00 runner`Main_Dispatch + 92
    frame #14: 0x0000aaaaaab03e90 runner`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
    frame #15: 0x0000aaaaaab035d0 runner`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
    frame #16: 0x0000aaaaaab1383c runner`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
    frame #17: 0x0000aaaaaab12d14 runner`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
    frame #18: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
    frame #19: 0x0000fffff7fad418 ld-musl-aarch64.so.1
    frame #20: 0x0000aaaaaab1c654 runner`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
    frame #21: 0x0000aaaaaab12c3c runner`ponyint_sched_start(library=false) at scheduler.c:1210:9
    frame #22: 0x0000aaaaaab15008 runner`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc18) at start.c:332:7
    frame #23: 0x0000aaaaaab033c0 runner`main + 240
    frame #24: 0x0000fffff7f73274 ld-musl-aarch64.so.1

It seems like the source of the failure is here: pony_error at posix_except.c:37:3 which points to this:

https://github.com/ponylang/ponyc/blob/f6e1b60cab21a8da21ca414e75595a8b497bdb5c/src/libponyrt/lang/posix_except.c#L37

So it seems like we're doing something wrong when it comes to exception unwinding. This can be verified with this minimal Pony program that also causes the segfault:

actor Main
  new create(env: Env) =>
    try
      error
    end

The bactrace for the above program:

* thread #3, name = 'crash_example', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
  * frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
    frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
    frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
    frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
    frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
    frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
    frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
    frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
    frame #8: 0x0000aaaaaaab59c0 crash_example`pony_error at posix_except.c:37:3
    frame #9: 0x0000aaaaaaaaa4f0 crash_example`Main_Dispatch + 56
    frame #10: 0x0000aaaaaaaac5e4 crash_example`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
    frame #11: 0x0000aaaaaaaabd24 crash_example`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
    frame #12: 0x0000aaaaaaabaa6c crash_example`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
    frame #13: 0x0000aaaaaaab9f44 crash_example`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
    frame #14: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
    frame #15: 0x0000fffff7fad418 ld-musl-aarch64.so.1
    frame #16: 0x0000aaaaaaac450c crash_example`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
    frame #17: 0x0000aaaaaaab9e6c crash_example`ponyint_sched_start(library=false) at scheduler.c:1210:9
    frame #18: 0x0000aaaaaaabc238 crash_example`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc68) at start.c:332:7
    frame #19: 0x0000aaaaaaaabbc0 crash_example`main + 240
    frame #20: 0x0000fffff7f73274 ld-musl-aarch64.so.1

Edit: this was tested against the latest commit of this branch (1e82be3cfb784eef74311202a81838999be8497d)

ergl avatar Jul 19 '22 17:07 ergl

Given this only happens on Arm musl, I don't feel confident saying that we are doing something wrong.

SeanTAllen avatar Jul 19 '22 18:07 SeanTAllen

Closing as this requires cirrus CI that we are moving away from

SeanTAllen avatar Aug 15 '23 22:08 SeanTAllen