zig std.Thread.Pool: process tree cooperation

Problem statement:

Parent process creates a std.Thread.Pool of size number_of_logical_cores
Child process creates a std.Thread.Pool of size number_of_logical_cores
Now there are 2*number_of_logical_cores threads active, cache thrashing which harms performance.

On POSIX systems, std.Thread.Pool should integrate by default with the POSIX jobserver protocol, meaning that if the jobserver environment variable is detected, it should coordinate with it in order to make the entire process tree share the same number of concurrently running threads.

On macOS, maybe we should use libdispatch instead. It's bundled with libSystem so it's always available and accomplishes the same goal.

I'm not sure what to do on Windows.

This is primarily a standard library enhancement, however the build system and compiler both make heavy usage of std.Thread.Pool so they would observe behavior changes.

In particular, running zig build as a child process from make would start cooperating and not create too many threads. Similarly, running the zig compiler from the zig build system would do the same. The other way is true too - running make as a child process from the zig build system would start cooperating. And then there are other third party tools that have standardized on the POSIX jobserver protocol, such as cargo.

There is one concern however which is that the protocol leaves room for "thread leaks" to occur if child processes crash. I'm not sure the best way to mitigate this. The problem happens when a child process has obtained a thread token from the pipe, and then crashes before writing the token back to the pipe. In such case the thread pool permanently has one less thread active than before, which is suboptimal, and would cause a deadlock if it happened a number of times exceeding the thread pool size.

#12101

Jun 12 '24 18:06 andrewrk

Alternative POSIX strategy based on advisory record locks:

Root process opens a new file with shm_open and writes N bytes to the file, where N is thread pool size. This file descriptor is passed down the process tree. Each byte in the file represents an available logical core. Each thread in each process's thread pool holds an advisory record lock (fnctl(F_SETLKW)) on the corresponding byte while working.

Advisory record locks are automatically released by the kernel when a process dies.

This unfortunately would mean that Zig could not play nicely with other applications such as make and cargo. But I think it's worth it. The whole point of the Zig project is to improve status quo, otherwise we could all just keep using C. The make jobserver protocol is fundamentally broken, so what Zig will do is step up and create a better protocol that has similar low overhead but not this glaring problem. Then it will be up to make and cargo to upgrade to the better standard.

As for the strategy I outlined above, I have not evaluated its efficacy yet. I'll report back with some performance stats when I do.

Jun 19 '24 01:06 andrewrk

Alternative POSIX strategy based on UNIX domain sockets:

Root process listens on a unix domain socket. That fd is passed down the process tree. To get a thread token a child process connects to the socket. To return the token, disconnect. The root process only accepts N concurrent connections and stops accepting when it's maxed out.

When a child process dies, the OS causes the connection to end, allowing the root process to accept a new connection.

2 upsides compared to the other proposed idea:

Operating systems are likely not optimized for very large numbers of threads waiting to lock different bytes in the same file.
The advisory record lock strategy requires each process to have a threadpool, while the make jobserver protocol allows for lazy thread spawning. This strategy as well allows for lazy thread spawning.

Jun 19 '24 06:06 andrewrk

Alternative pipes proposal (also incompatible with jobserver protocol):

This protocol overcomes the issue of make-jobserver protocol where it's impossible for server to tell when a child is taking a job (ie: read 1-byte). We overcome this by having the child notify the server before taking a job, so the server knows it needs to write 1-byte for a waiting child on their private pipe.

pros:

portability
reliably detect when child dies

cons:

filesystem activity for named-pipes has to be managed carefully
more complex than unix domain socket alternative

PIPE OVERVIEW:

1 global pipe shared between server and all children, this is a mux-pipe, server is reader, children are writers
1 unique named pipe between server and client; that is each server-client have private pipe opened in a coordinated fashion by client and server, server is writer, child is reader

IPC setup

server open tmpdir keeping handle open and allowing inherit
server create mux-pipe, parent is reader, child(s) are writers
child create job-pipe named fifo using inherited tmpdir and randomized name, maybe a pattern like "[RANDOM].[PID]" where random is the important part and pid is just for semi-reliable knowledge
job-pipe is write for parent, read for child
job-pipe name/basename is the child "key" which:
- is unique amongst all children in the tree
- used as the basename path component of the named fifo
- used as the first parameter for mux-pipe messaging

job flow

child: send mux-pipe message { child, checkout } where the first param is the child-key
server: recv mux-pipe message { child, checkout }, open (and remember handle of) named fifo of child-key
server write 1 byte to job-pipe granting the job and the child will not proceed with job until this is done
child then must read 1-byte on job-pipe before starting job
child: do job
child: send mux-pipe message { child, return}
server: recv mux-pipe message { child, return }
server: poll job-pipe for other-end pipe closure, and free outstanding checkouts that haven't been returned

cleanup

child unlink the named pipe when done; race is ok
server unlink the named pipe on other-end pipe closure; race is ok

Jun 19 '24 18:06 mikdusan

IPC and POSIX jobserver interaction for coordinating threaded work seem inefficient. Could it all be avoided by making zig commands (i.e. build-exe, test, etc.) normal functions accepting string args, then running them as tasks in a global Thread.Pool that they (+ build_runner.zig/test_runner.zig) can reach into?

Jun 21 '24 16:06 kprotty

After some discussion and design work with Cargo folks and the rest of the Zig team, I've put together a proposed protocol called the "Robust Jobserver" which aims to solve the "crashing child process" problem of the GNU Jobserver. As written, it supports Windows and most(?) POSIX systems.

Draft: https://codeberg.org/mlugg/robust-jobserver/src/branch/main/spec.md

I'd be interested to hear if anyone has thoughts or concerns regarding that specification. Keep in mind that the goal is that this system be useful not only to the Zig project, but also to other compilers, build systems, etc (really, anything which needs to coordinate CPU-bound work across a process tree). If there aren't any serious concerns, I intend to work on supporting it in the Zig compiler and build system at some point soon.

Nov 21 '25 16:11 mlugg