Segfault when using references in thread
Running code that uses a reference to an element in an array in a thread segfaults with the following backtrace:
frame #0: 0x000000018f386a60 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018f3bec20 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018f2cba30 libsystem_c.dylib`abort + 180
frame #3: 0x000000010036c32c php`zend_mm_panic + 52
frame #4: 0x000000010036e930 php`_efree_32 + 120
frame #5: 0x00000001003ea2f0 php`ZEND_ASSIGN_REF_SPEC_CV_VAR_HANDLER + 288
frame #6: 0x00000001003cdc38 php`execute_ex + 172
frame #7: 0x000000010104d03c parallel.so`php_parallel_scheduler_run(runtime=<unavailable>, frame=0x0000000102413020) at scheduler.c:331:13 [opt]
frame #8: 0x000000010104c62c parallel.so`php_parallel_thread(arg=0x000000010125f3c0) at scheduler.c:501:9 [opt]
frame #9: 0x000000018f3bef94 libsystem_pthread.dylib`_pthread_start + 136
You can reproduce using https://github.com/realFlowControl/1brc/blob/main/calculateAverage.php, the opline points to line 77 (where the array is accessed). If I comment the entire if-block, it segfaults in line 86 in the else-block.
It started with https://github.com/krakjoe/parallel/commit/32cd9c3776bcab820000fa6e0ebc75cae6205810 and the change that leads to this is this line:
https://github.com/krakjoe/parallel/blob/d8a01139227b41115e337c11f58901b77104ab60/src/copy.c#L191
Setting this back to 2 will fix the segfault, but lead to ASAN tests failing ;-)
I sadly have no idea what is going on :-(
Hi @realFlowControl!
Does this happen on any PHP version?
So far this issue is in the develop branch and the changes that lead to that are not yet released (that's why i'm withholding new minor releases). So it should not be related to https://github.com/hschimpf/parallel-sdk/issues/26
Interestingly another way to workaround in case no OPcache is around is to declare the $stations array static in
https://github.com/realFlowControl/1brc/blob/57cc8737e9f2593c907ce6e845e682fadc1ccf31/calculateAverage.php#L59
Smaller reproducer:
<?php
$run = function() {
$a = [];
for ($i = 0; $i < 100; $i++) {
$b = &$a[$i];
}
return 0;
};
for ($i = 0; $i < 10; $i++) {
$futures[$i] = \parallel\run($run, []);
}
for ($i = 0; $i < 10; $i++) {
var_dump($futures[$i]->value());
}
What is happening is that an array literal is copied to $a in $a = [];, but its refcount is not incremented because the IS_TYPE_REFCOUNTED flag is not set. Multiple threads will have the same array literal in $a, with a refcount of 1, and will modify it, which breaks semantics and also causes corruption and crash.
A refcount of 2 prevented this by forcing the VM to separate the array before modifying it.
Reverting two lines here and here would fix the issue, but this is inconsistent with other code in copy.c.
Changing this line to Z_TYPE_FLAGS_P(slot) &= ~IS_TYPE_COLLECTABLE; (so the IS_TYPE_REFCOUNTED is not removed) would also fix the problem, but this may be unsafe as ADDREF/DELREF are not thread safe.
Edit: When OPcache is enabled, parallel relies on it to manage literals. In this case, literals have a refcount of 2.