process icon indicating copy to clipboard operation
process copied to clipboard

Speed up close_fds with the new close_range() Linux/FreeBSD syscall

Open nh2 opened this issue 4 years ago • 2 comments

Background:

  • https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.9-Close-Range
  • https://lwn.net/Articles/789023/

As written in https://github.com/haskell/process/blob/cb1d1a6ead68f0e1b209277e79ec608980e9ac84/System/Process/Common.hs#L91

This implementation will call close() an every fd from 3 to the maximum of open files, which can be slow for high maximum of open files.

The new close_range() syscall solves this, closing them all in 1 go. According to the LWN link, it is very fast, and you can give it MAXINT.

The code that needs to be augmented (with CPP):

https://github.com/haskell/process/blob/cb1d1a6ead68f0e1b209277e79ec608980e9ac84/cbits/posix/runProcess.c#L255-L273

nh2 avatar Aug 12 '20 03:08 nh2

Since you're closing all fds you could call it with the CLOSE_RANGE_UNSHARE flag, i.e.

close_range(4, UINT_MAX, CLOSE_RANGE_UNSHARE)

The kernel will detect that you're closing all file descriptors and will make a copy of only the first three file descriptors and doesn't need to do any actual work closing all the others. Obviously if you do this in threaded environment than you can't use it if you want to close the fds for all threads. :)

brauner avatar Aug 14 '20 14:08 brauner

I've just come upon a really pathological behavior surrounding this code which happens when the call to sysconf(_SC_OPEN_MAX) returns a huge number.

On my Ubuntu machine, getconf OPEN_MAX returns 1048576. Fine, my machine can do 1 million superfluous close file descriptor calls without a noticeable delay.

But then I found a system (Kind Kubernetes environment on NixOS) where that variable is 1073741816! Now a call to createProcess takes 3.5 minutes and rails every CPU on my machine that entire time while the loop counts to a billion. (Interestingly, it rails all CPUs on GHC 9.0 and only a single CPU on GHC 9.2.)

So I'd request two things:

  1. Please let's use close_range on supported systems (apparently it became available in Linux 5.9)
  2. While researching this I learned the "normal" way to close file descriptors is to look through /proc/fd to find the file descriptors to close, and only if that fails do you fall back to the sysconf call and loop. I think taking this step when close_range is not available would be much better.

thomasjm avatar Jul 21 '22 13:07 thomasjm