ponyc icon indicating copy to clipboard operation
ponyc copied to clipboard

Segfault in cycle checker(?)

Open derrickturk opened this issue 3 years ago • 0 comments

Opening apology: this is a much-reduced version of a solution to Advent of Code's 2019 day 7 puzzle, which is hard to explain in a vacuum but comes down to implementing a tiny VM with async I/O and running multiple instances of it in communication with each other. After cutting out a lot of extraneous things, I'm left with the attached source files and input. I'm reluctant to reduce it further because I'm starting to think the crash has nothing to do with the VM logic itself, but it's challenging to produce relevant input for a cut-down VM.

My notes refer to testing on both Windows 10 and Arch Linux via WSL2. In both cases ponyc is the latest release 0.51.2. This is a 4-core machine.

Run with: ./intpony input.txt

The three observed behaviors are:

  • Almost instant exit with no output (~60-70% of the time)
  • Almost instant exit with correct output (~15% of the time)
  • Slow exit AFTER correct output (~15% of the time)

The only runtime option I've found to have any effect on this is --ponymaxthreads 1, which on Windows seemingly guarantees the intended output (with fast exit).

Compiled with -d, I get additional possible outcomes including mismatches between the count of RUN outputs and DONE outputs. (This should not happen given the input - each VM/Cpu should run to successful halt.)

The program ends in a segfault, usually, on either the release or debug binary. It's often reported (by gdb) in pony_os_peername on Windows, and in Array_I64_val_Trace on WSL2. Oddly, runs with no segfault also have no output, and successful runs produce output before segfaulting. With --ponymaxthreads 1, no segfault on Windows, but I still get segfaults on WSL2.

The test program creates 100 Cpu actors total, each with a 519-"word" memory (i.e. an Array[I64] with 519 entries). This can be adjusted; it seems that segfaults get more likely as the number goes up. I've never seen a segfault with only 1 or 2 actors, but I have with 4 or 5.

Full stack trace from a crash on WSL2:

#0  0x0000555555568160 in Array_I64_val_Trace ()
#1  0x000055555556895d in Array_u3_t2_$1$10_iso_$1$10_iso_collections__MapEmpty_val_collections__MapDeleted_val_Trace
    ()
#2  0x00007fffe6807600 in ?? ()
#3  0x00007fffe657d600 in ?? ()
#4  0x00007fffffffe750 in ?? ()
#5  0x00007ffff7c91c48 in ?? ()
#6  0x00007ffff7c91e00 in ?? ()
#7  0x0000000000000700 in ?? ()
#8  0x000055555557d534 in ponyint_actor_final ()
#9  0x000055555557f07b in ponyint_cycle_terminate ()
#10 0x000055555558726f in ponyint_sched_shutdown ()
#11 0x0000555555585acf in ponyint_sched_start ()
#12 0x00005555555877d9 in pony_start ()
#13 0x000055555557ce17 in main ()

I've also seen:

#0  0x00007fffe671ccc0 in ?? ()
#1  0x000055555557d534 in ponyint_actor_final ()
#2  0x000055555557f07b in ponyint_cycle_terminate ()
#3  0x000055555558726f in ponyint_sched_shutdown ()
#4  0x0000555555585acf in ponyint_sched_start ()
#5  0x00005555555877d9 in pony_start ()
#6  0x000055555557ce17 in main ()

The plot thickened with a suggestion on Zulip to run with --ponynoblock, disabling the cycle checker (IIUC). This resulted in "dropped output" about 50-75% of the time (rate maybe dependent on running under a debugger or not), but no segfaults.

intpony.zip

derrickturk avatar Sep 20 '22 02:09 derrickturk