reproject icon indicating copy to clipboard operation
reproject copied to clipboard

Provide a way to avoid dumping dask arrays to Numpy arrays

Open astrofrog opened this issue 8 months ago • 1 comments

It's now possible to index dask arrays using advanced Numpy indexing, which means one can do e.g.:

dask_array.vindex[ipix, jpix]

so this means that for the nearest neighbor method in reproject_interp we could just do this directly on input dask arrays. Similarly, for bilinear interpolation one would need to fetch four indices for each output pixel, but it seems doable.

We should investigate to see if the performance implications of this are prohibitive, but as long as it's not too slow it might be a good way and the only way to avoid dumping gigantic arrays to disk.

If we could do this it would make things easier for reprojecting from HiPS to standard projections.

astrofrog avatar Jul 04 '25 11:07 astrofrog

Performance doesn't seem too bad:

import time
import numpy as np
from dask import array as da

array = da.random.random((10000, 10000, 10000), chunks=(500, 500, 500))

N = 10_000_000

i = np.random.randint(4000, 4500, N)
j = np.random.randint(4000, 4500, N)

time1 = time.time()
result = array.vindex[i, j, 0].compute()
time2 = time.time()

print(time2 - time1)

outputs 5 seconds. Seems reasonable as it allows the data to not be completely dumped to disk.

astrofrog avatar Jul 04 '25 11:07 astrofrog

Done in #532

astrofrog avatar Nov 14 '25 14:11 astrofrog