emscripten
emscripten copied to clipboard
test_pthread_proxying_canceled_work: RuntimeError: operation does not support unaligned accesses
I've seen this a few times in CI recently.
======================================================================
FAIL: test_pthread_proxying_canceled_work (test_core.lsan)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/chrome-bot/.cache/vpython-root.1000/store/cpython+tkj8tpr2n0oj8m0t270p0m71s0/contents/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
yield
File "/home/chrome-bot/.cache/vpython-root.1000/store/cpython+tkj8tpr2n0oj8m0t270p0m71s0/contents/lib/python3.8/unittest/case.py", line 676, in run
self._callTestMethod(testMethod)
File "/home/chrome-bot/.cache/vpython-root.1000/store/cpython+tkj8tpr2n0oj8m0t270p0m71s0/contents/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
method()
File "/b/s/w/ir/x/w/install/emscripten/test/common.py", line 348, in decorated
f(self, *args, **kwargs)
File "/b/s/w/ir/x/w/install/emscripten/test/test_core.py", line 2484, in test_pthread_proxying_canceled_work
self.do_run_in_out_file_test(
File "/b/s/w/ir/x/w/install/emscripten/test/common.py", line 1931, in do_run_in_out_file_test
output = self._build_and_run(srcfile, expected, **kwargs)
File "/b/s/w/ir/x/w/install/emscripten/test/common.py", line 1969, in _build_and_run
js_output = self.run_js(js_file, engine, args,
File "/b/s/w/ir/x/w/install/emscripten/test/common.py", line 1556, in run_js
self.fail('JS subprocess failed (%s): %s (expected=%s). Output:\n%s' % (error.cmd, error.returncode, assert_returncode, ret))
File "/home/chrome-bot/.cache/vpython-root.1000/store/cpython+tkj8tpr2n0oj8m0t270p0m71s0/contents/lib/python3.8/unittest/case.py", line 753, in fail
raise self.failureException(msg)
AssertionError: JS subprocess failed (/b/s/w/ir/cache/builder/emscripten-releases/node-v20.14.0-linux-x64/bin/node --stack-trace-limit=50 --trace-uncaught /b/s/w/ir/x/t/emtest_jarzrh1c/emscripten_test_lsan_7vafnkyn/test_pthread_proxying_canceled_work.js): 1 (expected=0). Output:
testing cancel followed by proxy
wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1
RuntimeError: operation does not support unaligned accesses
at test_pthread_proxying_canceled_work.wasm.a_cas (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[186]:0x5205)
at test_pthread_proxying_canceled_work.wasm.__private_cond_signal (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[192]:0x52a9)
at test_pthread_proxying_canceled_work.wasm.pthread_cond_signal (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[174]:0x4ade)
at test_pthread_proxying_canceled_work.wasm.emscripten_proxy_finish (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[128]:0x3c87)
at test_pthread_proxying_canceled_work.wasm.call_then_finish_task (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[141]:0x422d)
at test_pthread_proxying_canceled_work.wasm.call_with_ctx (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[136]:0x40de)
at test_pthread_proxying_canceled_work.wasm.em_task_queue_execute (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[110]:0x3506)
at test_pthread_proxying_canceled_work.wasm.receive_notification (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[119]:0x3873)
at test_pthread_proxying_canceled_work.wasm.em_task_queue_execute (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[110]:0x3506)
at test_pthread_proxying_canceled_work.wasm._emscripten_check_mailbox (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:wasm-function[291]:0x702d)
at /b/s/w/ir/x/t/emtest_jarzrh1c/emscripten_test_lsan_7vafnkyn/test_pthread_proxying_canceled_work.js:1117:12
at callUserCallback (/b/s/w/ir/x/t/emtest_jarzrh1c/emscripten_test_lsan_7vafnkyn/test_pthread_proxying_canceled_work.js:4730:5)
at checkMailbox (/b/s/w/ir/x/t/emtest_jarzrh1c/emscripten_test_lsan_7vafnkyn/test_pthread_proxying_canceled_work.js:4761:5)
Thrown at:
at $a_cas (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:20998)
at $__private_cond_signal (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:21162)
at $pthread_cond_signal (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:19167)
at $emscripten_proxy_finish (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:15496)
at $call_then_finish_task (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:16942)
at $call_with_ctx (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:16607)
at $em_task_queue_execute (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:13575)
at $receive_notification (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:14452)
at $em_task_queue_execute (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:13575)
at $_emscripten_check_mailbox (wasm://wasm/test_pthread_proxying_canceled_work.wasm-0010c28e:1:28718)
@tlively any idea what this might be?
This is pretty weird. It's failing while trying to finish synchronously proxied work on a proxy worker that has already been cancelled. That work should never have been sent to the cancelled proxy worker to begin with.
To send the work, the proxying thread has to increment the refcount on the proxy worker thread's mailbox with emscripten_thread_mailbox_ref:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/system/lib/pthread/thread_mailbox.c#L18-L33
But this should fail if the proxy worker thread's mailbox has been closed, which it should have been. _emscripten_thread_mailbox_shutdown synchronously waits for the mailbox to be closed:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/system/lib/pthread/thread_mailbox.c#L50-L57
_emscripten_thread_mailbox_shutdown is called as one of the first steps in _emscripten_thread_exit:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/system/lib/pthread/pthread_create.c#L307
So either this mailbox closing mechanism is failing somehow and the mailbox is being resurrected, or the thread is not fully cancelled (specifically never got to the mailbox shutdown step) before the proxying happens in the test.
The test ensures that the proxy worker thread has been fully cancelled by registering a TLS key with a destructor called set_flag:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/test/pthread/test_pthread_proxying_canceled_work.c#L348
The proxy worker thread is passed the address of a canceled flag:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/test/pthread/test_pthread_proxying_canceled_work.c#L68-L69
Which it then associates with the TLS key before cancelling itself.
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/test/pthread/test_pthread_proxying_canceled_work.c#L49-L51
set_flag then writes a 1 into the flag address on the next turn of the event loop, which should be after the thread has been cancelled and the mailbox closed:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/test/pthread/test_pthread_proxying_canceled_work.c#L36-L44
The test does not continue on to perform the proxying until this flag has been set:
https://github.com/emscripten-core/emscripten/blob/106a27c1018cf3c6104541eb176b9d0182805841/test/pthread/test_pthread_proxying_canceled_work.c#L71-L73
To investigate this further, I might punch through some abstractions to assert in the test that the proxy worker's thread mailbox is closed after that while (!canceled) loop. I would expect that assertion to fail precisely whenever this error you're seeing would occur. From there I would drill down and try to rule out the possibility that the flag is simply set before the cancelled thread closes its mailbox. If the bug is not there, then it must be in the mailbox shutdown mechanism itself.