[LibOS] `execve`: eliminate race on `clear_child_tid` before VMAs deallocation
Description of the changes
Fixes #2148
This PR makes execve() wait until every sibling thread’s *clear_child_tid is zeroed before deallocating their VMAs.
Implementation details:
- Grab the
g_thread_listas soon as the calling thread acquiresfirst. - For each sibling thread, the calling thread will check:
If
*clear_child_tid != 0, invokefutex_wait(); it will be awakened byrelease_clear_child_tid()viafutex_wake(). - After all other threads' (except the main thread)
*clear_child_tidare cleared, the calling thread then starts to deallocate VMAs.
How to test this PR?
Repeating gramine-sgx exec_same [args_#1...args_#49]
Without this PR – the test usually fails within a few minutes, especially on the branch of PR #1795 because of the issue mentioned above. The main branch takes longer to fail. With this PR applied – the same loop runs for hours without any failures on the branch of PR #1795 .
Jenkins, test this please
Jenkins, retest this please
(All failures seem to be connectivity issues.)
ERROR: Checkout failed
[2025-08-04T11:02:59.002Z] java.io.StreamCorruptedException: invalid stream header: 636F7272
...
25-08-04T11:02:59.002Z] Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to penguins-3-noble
[2025-08-04T11:02:59.002Z] Caused: hudson.remoting.RequestAbortedException
[2025-08-04T11:15:36.177Z] Connecting to busybox.net (busybox.net)|140.211.167.122|:443... connected.
[2025-08-04T11:16:44.937Z] Unable to establish SSL connection.
[2025-08-04T11:16:44.937Z] download: WARNING: Hash mismatch: Expected 415fbd89e5344c96acf449d94a6f956dbed62e18e835fc83e064db33a34bd549 but received e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
[2025-08-04T11:16:44.937Z] download: ERROR: Failed to download 'busybox.tar.bz2' (415fbd89...)! No URLs left to try.
[2025-08-04T11:16:44.937Z] make: *** [Makefile:13: busybox.tar.bz2] Error 1
Jenkins, retest this please
(All failures seem to be connectivity issues.)
ERROR: Checkout failed [2025-08-04T11:02:59.002Z] java.io.StreamCorruptedException: invalid stream header: 636F7272 ... 25-08-04T11:02:59.002Z] Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to penguins-3-noble [2025-08-04T11:02:59.002Z] Caused: hudson.remoting.RequestAbortedException[2025-08-04T11:15:36.177Z] Connecting to busybox.net (busybox.net)|140.211.167.122|:443... connected. [2025-08-04T11:16:44.937Z] Unable to establish SSL connection. [2025-08-04T11:16:44.937Z] download: WARNING: Hash mismatch: Expected 415fbd89e5344c96acf449d94a6f956dbed62e18e835fc83e064db33a34bd549 but received e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 [2025-08-04T11:16:44.937Z] download: ERROR: Failed to download 'busybox.tar.bz2' (415fbd89...)! No URLs left to try. [2025-08-04T11:16:44.937Z] make: *** [Makefile:13: busybox.tar.bz2] Error 1
Could someone kindly help me trigger a Jenkins retest? My retest command doesn’t seem to work—possibly due to a permission issue. I also don’t have access to the build logs.
Jenkins, retest this please
Add to whitelist
The deb job failed with: The repository 'http://deb.debian.org/debian bullseye-backports Release' no longer has a Release file.
This looks legit, and unrelated to this PR.
Jenkins, retest this please. Just seeing if the webhook gets reactivated.