reproject icon indicating copy to clipboard operation
reproject copied to clipboard

reproject_interp() fails silently for very large images

Open mhardcastle opened this issue 8 years ago • 20 comments

I'm trying to use reproject in a pipeline that makes a weighted mosaic of several 20000x20000 images. reproject_interp() gives essentially identical output to reproject_exact() for small output images (e.g. ~ 30000 x 20000) but at some point starts failing silently, returning an output image with every pixel, even ones that would have been outside the original image, set to some arbitrary value (comparable to the central value in the actual image). reproject_exact() can cope with the same images but is of course quite a lot slower; however, the point is that this looks like a bug. I can try to produce a self-contained bit of code that would demonstrate this if it's needed.

mhardcastle avatar Dec 14 '16 18:12 mhardcastle

Thanks for the report - it would be great if you could create a 'minimal' failing example to reproduce this.

astrofrog avatar Dec 14 '16 18:12 astrofrog

OK, will do.

mhardcastle avatar Dec 14 '16 19:12 mhardcastle

OK, here we go. This triggers the problem for me.

Note you'll need a system with tens of GB of RAM to run it, and it will take a little while.

For me, reducing the big image size to say 20000 or less, or using reproject_exact() instead, makes this work fine (there should be a small random image in the bottom corner), but as it is here it will produce a 40000x40000 image each pixel of which will have the same value.

from reproject import reproject_interp,reproject_exact
from astropy.io import fits
from astropy.wcs import WCS
import numpy as np

# pick algorithm to use
reproj=reproject_interp

mra=180.0
mdec=45.0
cellsize=1.0/3600
ctype=('RA---SIN','DEC--SIN')

# pick sizes here
rxsize=rysize=40000
xsize=ysize=1024

rwcs=WCS(naxis=2)
rwcs.wcs.ctype=ctype
rwcs.wcs.cdelt=(-cellsize,cellsize)
rwcs.wcs.crval=[mra,mdec]
rwcs.wcs.crpix=[1,1]

rheader=rwcs.to_header()
rheader['NAXIS']=2
rheader['NAXIS1']=rxsize
rheader['NAXIS2']=rysize

header=rwcs.to_header()
header['NAXIS']=2
header['NAXIS1']=xsize
header['NAXIS2']=ysize

data=np.random.rand(ysize,xsize)

hdu=fits.PrimaryHDU(header=header,data=data)
r,footprint=reproj(hdu, rheader, hdu_in=0)
fits.PrimaryHDU(header=rheader,data=r).writeto('output.fits',clobber=True)

mhardcastle avatar Dec 15 '16 08:12 mhardcastle

@mhardcastle - thanks for the example! Before I try and run it (it will take me a few days to get hold of a machine that can run it) can you check whether it is sufficient for one of the dimensions to be ~40000 for this to fail? (for instance does 40000x4000 work?)

astrofrog avatar Dec 15 '16 09:12 astrofrog

Good question. No, I'm afraid 40000x4000 does work, though almost certainly the overall size of the output image doesn't need to be quite as big as 40000x40000 to provoke the problem.

mhardcastle avatar Dec 15 '16 09:12 mhardcastle

@mhardcastle - I can't find a machine to try this on, so could you run the following examples and let me know what happens?

io.fits

This is to check whether astropy.io.fits has issues when writing out arrays this size:

import numpy as np
from astropy.io import fits
from scipy.ndimage import map_coordinates

N = 1024
M = 40000

array = np.random.random((N,N))

output = np.zeros((M,M))
output[:N,:N] = array

fits.writeto('output.fits', output, clobber=True)

map_coordinates

Assuming the above example works, this is to check whether map_coordinates has issues with arrays this large:

import numpy as np
from scipy.ndimage import map_coordinates

N = 1024
M = 40000

array = np.random.random((N,N))

x = np.linspace(0, N * 2, M)
y = np.linspace(0, N * 2, M)

X, Y = np.meshgrid(x, y)

coords = np.vstack([X.ravel(), Y.ravel()])

output = map_coordinates(array, coords).reshape(X.shape)

from astropy.io import fits
fits.writeto('output.fits', output, clobber=True)

exact array size

If the above example fails, could you use it to figure out more precisely where the transition happens? Could you try for instance to see whether the behavior changes around sizes of 32768? (2^15)

astrofrog avatar Dec 15 '16 10:12 astrofrog

(there was a typo in the first example, it should have had M=40000)

astrofrog avatar Dec 15 '16 10:12 astrofrog

First one is fine — consistent with the fact that the problem doesn't appear with reproject_exact(). I'll report back on the second when it's finished.

I can arrange an account for you on a cluster with a bunch of suitable machines, but let's see if that's necessary first.

mhardcastle avatar Dec 15 '16 10:12 mhardcastle

The second code fragment also runs fine (i.e. a scaled version of the 1024x1024 appears in the lower left quadrant of the output array).

I'll see if I can see whether 2^15 is special with my example above. It's certainly close to the boundary where things stop working.

mhardcastle avatar Dec 15 '16 10:12 mhardcastle

@mhardcastle - another test (if you don't mind) would be to edit the code here:

https://github.com/astrofrog/reproject/blob/master/reproject/interpolation/high_level.py#L91

to force _reproject_full to get called instead of _reproject_celestial, just to check whether the bug is in _reproject_celestial.

astrofrog avatar Dec 15 '16 11:12 astrofrog

Well, that's remarkable: image sizes of 2^15 x 2^15 demonstrate the problem with my test code above but an image of 2^15-1 on a side does not. So I guess something about image sizes >= 2^30 elements is the root of the problem.

I'll have a look at the test you suggest.

mhardcastle avatar Dec 15 '16 12:12 mhardcastle

Just to check, what OS are you using?

astrofrog avatar Dec 15 '16 13:12 astrofrog

Linux: Scientific Linux 7.

mhardcastle avatar Dec 15 '16 13:12 mhardcastle

@mhardcastle - just to check, did you have a chance to try the experiment above with _reproject_full?

I think this does highlight in any case the need to have a way to do the reprojection in chunks (https://github.com/astrofrog/reproject/issues/37).

astrofrog avatar Dec 16 '16 16:12 astrofrog

I haven't had a spare moment, but I haven't forgotten -- I need to get this resolved to make progress on another project (or give up and go back to montage-wrapper ...) so it's not going to get buried.

mhardcastle avatar Dec 16 '16 16:12 mhardcastle

(And yes, chunking would be very useful. Particularly if that allowed parallel mode to work in reproject_exact() -- it crashes on these big images presumably because it runs out of memory.)

mhardcastle avatar Dec 16 '16 16:12 mhardcastle

@mhardcastle - see https://github.com/astrofrog/reproject/issues/37#issuecomment-267645100

astrofrog avatar Dec 16 '16 17:12 astrofrog

Just checking in to see if there are any recent updates or work arounds available for this. I have a fairly large image that reproject_interp is having issues with. The returned array is filled with either nan or a single number. @astrofrog the link above gives me a page not found error. I did see reference to another issue the chunks the data to keep reproject_interp from falling over, but when trying to reproduce that approach it seems like some of the core utilities have been restructured, and I am not familiar enough with the code base to find the equivalent functions.

tjgalvin avatar Dec 24 '20 03:12 tjgalvin

I would also love to use chunking with reproject_and_coadd()

drtobybrown avatar May 17 '21 23:05 drtobybrown

We have been using the chunking workround discussed in 2016 in production since then: see https://github.com/mhardcastle/ddf-pipeline/blob/master/utils/reproj_test.py

mhardcastle avatar May 18 '21 07:05 mhardcastle

Hi everyone, with the latest version of reproject you can now pass e.g. block_size=(2048, 2048) to operate in chunks. I'll close the issue now, but feel free to re-open it if you are still having issues with large arrays/chunking!

astrofrog avatar Jun 13 '24 10:06 astrofrog