fitsio icon indicating copy to clipboard operation
fitsio copied to clipboard

please release the GIL when calling the C library

Open fommil opened this issue 1 month ago • 12 comments

Hi, thank you for releasing this library, I am using it as one of my very minimal dependencies over in https://github.com/fommil/luddcam/blob/2673cb613d6cf53daac73991f118dfb5e40221a1/luddcam_capture.py#L318

I am spawning a thread to do my fits writing, with the hopes that any heavy lifting would be releasing the GIL and therefore not impacting the timings of my app. However, I sadly discovered that fitsio is still running within the confines of the GIL, and therefore blocks progress on all other threads. To declare the long-running call as exempt from the GIL, it is my understanding that there needs to be CPython or native binding calls... this is not something we can do in pure python and not something I can workaround in my app.

The GIL is going away eventually, but it'll be many years before it will be default.

Compression in particular makes the writes very slow, so workaround is to use fitsio to write out without compression then shell out to fpack (external form compression)

subprocess.run(["fpack", "-D", "-g1", self.out], check=True)

(I've noticed that this tends to be much quicker than using fitsio to do the compression... 10x faster, I don't know why since it's all happening in the native code, maybe fpack does multicore or something)

It would, of course, be ideal to not have any GIL blocking even when writing without compression.

fommil avatar Nov 25 '25 17:11 fommil

I think we have the C API write directly into the memory owned by numpy arrays. Thus I don't think we can actually release the GIL right now. @esheldon should comment to ensure I have this right.

beckermr avatar Nov 25 '25 18:11 beckermr

From the numpy docs

In the future, we may add locking to ndarray to make writing multithreaded algorithms using NumPy arrays safer, but for now we suggest focusing on read-only access of arrays that are shared between threads, or adding your own locking if you need to mutation and multithreading.

See https://numpy.org/doc/2.3/reference/thread_safety.html.

beckermr avatar Nov 25 '25 18:11 beckermr

If you have an example of the speed difference for writing compressed data, I am happy to take a look!

beckermr avatar Nov 25 '25 18:11 beckermr

this old PR is relevant https://github.com/esheldon/fitsio/pull/35

We did not adopt it at the time, but this shows how we could do it

esheldon avatar Nov 25 '25 18:11 esheldon

In that PR, I would not wrap the C-level read functions in NOGIL statements. I don't think this is safe for memory owned by a numpy array. There is nothing stopping python from changing the destination numpy array memory while we are trying to write to it.

I am also concerned about the NOGIL write statements. Unless we make a copy of the numpy data and write to the file via C using the copy, python could cleanup the memory we are reading before we are done with it.

beckermr avatar Nov 25 '25 18:11 beckermr

Yeah I can make a standalone example for sure, using one of my fits test files. I'll paste it here later today, and I'll show you where to grab a file from.

fommil avatar Nov 26 '25 09:11 fommil

I must have measured something wrong yesterday, I was probably measuring the impact on the other thread rather than the direct CPU performance, but certainly fitsio is faster at straight compression, but it's holding the GIL, so not suitable in multithreaded. So to avoid blocking an app, we have to go the (slower) fpack route in a subprocess.

I used the file at https://github.com/fommil/luddcam/tree/main/test_data/sony_a7iii/m31/exposures

When running this I get (no metadata, it doesn't impact perf...)

$ python3 fits_perf.py 
time to write compressed file with fitsio (GIL): 0.61
time to write uncompressed file with fitsio (GIL): 0.02
time to compress file with fpack (NOGIL): 0.75
import fitsio
import os
import subprocess
import time

if __name__ == "__main__":
    input = "light_00001.fit.fz"
    data = fitsio.FITS(input)[1].read()

    out = "output_fitsio_compressed.fits"
    if os.path.exists(out):
        os.remove(out)
    start = time.perf_counter()
    with fitsio.FITS(out, "rw") as fits:
        fits.write(data, compress="GZIP_1")
    end = time.perf_counter()
    # unfortunately flushing holds the GIL too...
#    with open(out, "rb+") as f:
#        os.fsync(f.fileno())
    print(f"time to write compressed file with fitsio (GIL): {end - start:.2f}")

    out = "output_fitsio_fpack.fits"
    if os.path.exists(out):
        os.remove(out)
    start = time.perf_counter()
    with fitsio.FITS(out, "rw") as fits:
        fits.write(data, compress=None)
    # with open(out, "rb+") as f:
    #     os.fsync(f.fileno())
    end = time.perf_counter()
    print(f"time to write uncompressed file with fitsio (GIL): {end - start:.2f}")

    if os.path.exists(out + ".fz"):
        os.remove(out + ".fz")
    start = time.perf_counter()
    subprocess.run(["fpack", "-D", "-g1", out], check=True)
    # with open(out + ".fz", "rb+") as f:
    #     os.fsync(f.fileno())
    end = time.perf_counter()
    print(f"time to compress file with fpack (NOGIL): {end - start:.2f}")

fommil avatar Nov 26 '25 11:11 fommil

I said

In that PR, I would not wrap the C-level read functions in NOGIL statements. I don't think this is safe for memory owned by a numpy array. There is nothing stopping python from changing the destination numpy array memory while we are trying to write to it.

I am also concerned about the NOGIL write statements. Unless we make a copy of the numpy data and write to the file via C using the copy, python could cleanup the memory we are reading before we are done with it.

but I actually think this is not quite right.

The python call to the C function still holds a reference to the numpy array. Thus I don't think anything will be moved in memory or deallocated while the C api calls are happening.

It is true though that in general IO with fitsio is NOT thread safe. Holding the GIL avoids this issue but we could instead punt to the user. In general, python usually punts to the user on these issues and so this might be better.

beckermr avatar Nov 26 '25 12:11 beckermr

Here is some text from the cfitsio user guide:

2.4 Using CFITSIO in Multi-threaded Environments

CFITSIO can be used either with the POSIX pthreads interface or the OpenMP interface for multi-threaded parallel programs. When used in a multi-threaded environment, the CFITSIO library must be built using the -D_REENTRANT compiler directive. This can be done using the following build commands:

./configure --enable-reentrant make

A function called fits_is_reentrant is available to test whether or not CFITSIO was compiled with the -D_REENTRANT directive. When this feature is enabled, multiple threads can call any of the CFITSIO routines to simultaneously read or write separate FITS files. Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes. Different threads should not share the same ’fitsfile’ pointer to read an opened FITS file, unless locks are placed around the calls to the CFITSIO reading routines. Different threads should never try to write to the same FITS file.

So to make this work, we'd need to

  • add the configure flag to our vendored build
  • only release the GIL if fits_is_reentrant returns 1/true

beckermr avatar Nov 26 '25 12:11 beckermr

One could argue that it's on the python dev to ensure they aren't doing multiple concurrent calls, but your suggestion is definitely backwards compatible. I tried to see if Debian 13 / Raspberry Pi OS are using those flags but I got lost trying to navigate the sites. I'd be very surprised that cfitsio is not thread safe by default, surely lots of pipelines involve parallel processing of images?

fommil avatar Nov 26 '25 12:11 fommil

For sure in our bundled/vendored builds, cfitsio is not thread safe. This log (https://github.com/esheldon/fitsio/actions/runs/19636723733/job/56229163252) does not have the macro defined when the code is being compiled.

Other devs would have to explicitly add the flag and know about it. That seems unlikely.

beckermr avatar Nov 26 '25 12:11 beckermr

FYI after lunch I had the energy to look again at debian, https://packages.debian.org/trixie/libcfitsio-dev

Downloading and expanding the .tar.xz file, then looking in the rules file, we can see

override_dh_auto_configure-arch:
	dh_auto_configure -- --host=$(DEB_HOST_GNU_TYPE) --enable-reentrant --with-bzip2

That's in 4.6.2-2, I don't know how long it's been there, but that's also the stable release version, so probably the same one that's on raspberry pis (my target platform).

fommil avatar Nov 26 '25 13:11 fommil