cufinufft
cufinufft copied to clipboard
Implementing 1.25 upsampling factor with precomputed Horner kernel
Hi all,
Thank you very much for everyone's work on this library. What would be required to implement the upsampling ratio sigma of 1.25 in a similar way to the cpu finufft version? From what I understand, in the case of finufft, the values for both 1.25 and 2.0 are precomputed while they are only precomputed for 2.0 with cufinufft.
At first glance there would be src/cuspreadinterp.h to change, but I don't know if any further modifications would be necessary.
Hi Aaron, You are right. Are you running out of RAM for the FFTs? (that would be a good reason to implement this!) It would be very useful if you could try this out in a draft PR. I think merely updating the code you mention, and github.com/flatironinstitute/cufinufft/blob/master/contrib/spreadinterp.cpp to match the functions in FINUFFT's https://github.com/flatironinstitute/finufft/blob/master/src/spreadinterp.cpp and maybe adding a flag/switch in the tester routines, should be enough. I don't think it is too hard.
I hope @MelodyShih will also chime in about what would need to be changed. She is finishing PhD so is quite busy.
She might remember if there's some reason we didn't do this. (Maybe since the spreading kernels are larger for upsampfac=1.25 it doens't help much on GPU side?)
I may be able to help if you get stuck. Best, Alex
Hi Alex,
I have been experimenting a bit with FINUFFT and CUFINUFFT on an MRI reconstruction and changing the upsampling factor to 1.25 leads to faster reconstruction, so I am wondering if it would have the same effects with CUFINUFFT. I don't know if there are further modifications to do here: https://github.com/flatironinstitute/finufft/blob/master/src/spreadinterp.cpp
I am not quite sure to understand how the testing routines work, so I don't know where the flag would be pertinent.
Hi,
I think the changes you made should be enough. Other places to look for is the check here https://github.com/flatironinstitute/cufinufft/blob/master/src/2d/spread2d_wrapper.cu#L661 and https://github.com/flatironinstitute/cufinufft/blob/master/src/3d/spread3d_wrapper.cu#L1181, we might need to find out a bin size that works for all possible tolerance (or, restrict the case of using smaller upsampling factor?)
eg. for single precision, if my calculation is correct, the largest ns is 12
(eps = 6e-8), then for 3D problems, the check will fail if using the current default bin size (16,16,2): (16+12)x(16+12)x(2+12)x2x4 = 87808 > 49152; we will be fine for 1D and 2D problems: (16+12)*(16+12)*8 < 49152.
This was fixed by https://github.com/flatironinstitute/finufft/pull/488. Closing.