process
process copied to clipboard
Speed up close_fds with the new close_range() Linux/FreeBSD syscall
Background:
- https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.9-Close-Range
- https://lwn.net/Articles/789023/
As written in https://github.com/haskell/process/blob/cb1d1a6ead68f0e1b209277e79ec608980e9ac84/System/Process/Common.hs#L91
This implementation will call
close()
an every fd from 3 to the maximum of open files, which can be slow for high maximum of open files.
The new close_range()
syscall solves this, closing them all in 1 go. According to the LWN link, it is very fast, and you can give it MAXINT
.
The code that needs to be augmented (with CPP):
https://github.com/haskell/process/blob/cb1d1a6ead68f0e1b209277e79ec608980e9ac84/cbits/posix/runProcess.c#L255-L273
Since you're closing all fds you could call it with the CLOSE_RANGE_UNSHARE
flag, i.e.
close_range(4, UINT_MAX, CLOSE_RANGE_UNSHARE)
The kernel will detect that you're closing all file descriptors and will make a copy of only the first three file descriptors and doesn't need to do any actual work closing all the others. Obviously if you do this in threaded environment than you can't use it if you want to close the fds for all threads. :)
I've just come upon a really pathological behavior surrounding this code which happens when the call to sysconf(_SC_OPEN_MAX)
returns a huge number.
On my Ubuntu machine, getconf OPEN_MAX
returns 1048576
. Fine, my machine can do 1 million superfluous close
file descriptor calls without a noticeable delay.
But then I found a system (Kind Kubernetes environment on NixOS) where that variable is 1073741816
! Now a call to createProcess
takes 3.5 minutes and rails every CPU on my machine that entire time while the loop counts to a billion. (Interestingly, it rails all CPUs on GHC 9.0 and only a single CPU on GHC 9.2.)
So I'd request two things:
- Please let's use
close_range
on supported systems (apparently it became available in Linux 5.9) - While researching this I learned the "normal" way to close file descriptors is to look through
/proc/fd
to find the file descriptors to close, and only if that fails do you fall back to thesysconf
call and loop. I think taking this step whenclose_range
is not available would be much better.