Avoid writing out a temporary memmap if the input data can already be recognized as a memmap

Open astrofrog opened this issue 2 years ago • 1 comments

Currently when in parallel mode, if a user specifies a filename for the input file we load the data with astropy.io.fits, which even if memory mapped is then written out to a new memory-mapped file for the purposes of the parallel computation (to avoid copying the array in memory to all the processes).

We should find a way, whenever possible, to avoid writing out a new memmap if the original data is backed by a file on disk.

I'm not sure if there is a way to do this if HDU objects are passed in as hdu.data is a regular Numpy array and the details of the memmap are hidden in the buffer. However if a filename is passed, we should be able to set up a memmap ourselves using the BITPIX and NAXIS? in the header for the HDU. If parallel mode is specified, we could then warn if an HDU or HDUList is passed that this is not optimal and that a filename should be passed instead.

We should also make sure we support passing in np.memmap objects and properly handle these (again avoiding any re-writing out of the arrays).

Sep 14 '23 09:09 astrofrog

Given an HDU, we can actually do:

np.memmap(hdu.fileinfo()['file'].name, mode='r', dtype=hdu.data.dtype, shape=hdu.data.shape, offset=hdu.fileinfo()['datLoc'])

to extract a Numpy memmap, so perhaps that's the way to go for FITS input.

Sep 14 '23 10:09 astrofrog