torchkbnufft Support for NFFTs of real valued Arrays

Hi,

as far as I can see, there is no support for real-valued input arrays?

However, with rfft we could save a factor of 2 on computation for those cases by using rfft.

Are there some issues with that?

Best,

Felix

Apr 30 '23 12:04 roflmaostc

Hello @roflmaostc, it wasn't a priority for me since MRI data is always complex. Today I'm unable to implement it due to time constraints, but I could possibly consider a PR if someone else did it.

May 01 '23 14:05 mmuckley

I can try to help since I would love that feature! Maybe specified with another parameter? Doing that implicitly might be dangerous. fftmethod="fft" or fftmethoid="rfft".

What is requried?

Would that be here? https://github.com/mmuckley/torchkbnufft/blob/20c45a23b4d2894fd118904131129d51104cd64f/torchkbnufft/_nufft/fft.py#L9

There is probably more steps to do with the interpolation?

May 01 '23 14:05 roflmaostc

I think the interpolator could be a little bit involved. First you'd want to check that it has no imaginary values (I don't remember). Then there might be a question of reflecting the results to the Hermitian-symmetric part of Fourier space so that the k-nearest neighbors algorithm works correctly.

After doing these things I'm actually guessing there would be no speedup, because the interpolator is the bottleneck anyway. Maybe a deeper look into the NUFFT algorithm to consider real inputs would be necessary to get an efficient implementation. I don't think it's as simple as changing the forward FFT op.

May 01 '23 15:05 mmuckley

What is the estimated overhead in your case? Is that interpolator that slow? I mean, normal cubic interpolation is much faster hence I wonder what's so different in this case. Is that better in different libraries?

I observed the following.

Does that match with your experiences? So NUFFT along the last two dims, and a leading batch dim.

voxels = torch.zeros(100,1, 768,768, device="cuda")+ 0j
# 300 angles
nufft, adjnufft, ktraj = prepare_nufft(voxels, 300, numpoints=3, gridfactor=1.25, device="cuda")

%%timeit
torch.fft.fft2(voxels +0j)
torch.cuda.synchronize()
# 9ms

%%timeit
nufft(voxels + 0j, ktraj)
torch.cuda.synchronize()
# 61ms

In that case, a RFFT really would only save ~4 to 5ms. (maybe a little more because of 1.25 padding)

May 01 '23 15:05 roflmaostc

85% overhead seems a little high but could be possible for some trajectories. It's actually trajectory-dependent. If you only have one radial spoke then it's possible the FFT takes longer.

May 02 '23 00:05 mmuckley

I actually used 300 radial spokes of length 768 each.

And a batch size of 100 maybe, in max.

What should be the shape of ktraj in this case? On my case it was (2, 768*300).

Tbh, in the documentation I was confused about coils, etc.

Should I rearrange the spokes along another dim for more parallelization?

May 02 '23 06:05 roflmaostc

torchkbnufft torchkbnufft copied to clipboard

Support for NFFTs of real valued Arrays

torchkbnufft
torchkbnufft copied to clipboard