NFFT.jl Suggestion: infer plan type from `x` type

If I understand correctly, the current approach is that user decides between cpu and gpu versions by either

CPU: p = plan_nfft(x, N) or (currently equivalently I think) p = plan_nfft(Array, x, N)
GPU: p = plan_nfft(CuArray, x, N)

I'd prefer that the default plan type come from the type of x by adding a method something like plan_nfft(x, N) = plan_nfft(typeof(x), x, N)

So if x is a CuArray then by default the plan will be a GPU plan.

The reason is that then I can embed plan_nfft into downstream packages without forcing them to depend on CUDA.jl. This might also "future proof" it so that someday when we have a OpenCLArray version then again it can inherit from x. But I might be missing something? And I have not really been able to test it out yet because I think the recent CUDA fixes are waiting for release updates.

I realize that if x is an Adjoint type (or a Range) then this would need additional work to get at the "base" of such types.

Jun 22 '22 03:06 JeffFessler

But I might be missing something?

No, your proposal is clever. One argument against it would be that in practice the nodes will not neccecary need to be copied to the GPU. Our current implementation in CuNFFT actually precomputes the (sparse) convolution matrix on the CPU and then copies it to the GPU.

And I have not really been able to test it out yet because I think the recent CUDA fixes are waiting for release updates.

Thanks for the trigger, will do later. The real issue is that we don't have CI for that and I therefore need to manually run the tests on a dedicated computer.

I realize that if x is an Adjoint type (or a Range) then this would need additional work to get at the "base" of such types.

That is fixable by having a method that collects the nodes first and afterwards makes the GPU/CPU dispatch. We currently do this here https://github.com/JuliaMath/NFFT.jl/blob/master/AbstractNFFTs/src/derived.jl#L28. But the order of the methods would need to change. Will have a look at that.

In general: This GPU/CPU dispatching has not seen any real world usage. For instance in MRIReco.jl it is not yet possible to use GPUs without hacking the source code. So any real-world testing (e.g. in MIRT) would be appreciated.

Jun 22 '22 05:06 tknopp

ping @migrosser

Jun 22 '22 05:06 tknopp

The real issue is that we don't have CI for that

Bummer. I would offer to do the test on my GPU machine if this were a single package, but I don't really know how to do such test properly for a repo with multiple nested packages. I used ]add with the #master branch of CuNFFT and NFFT and then did test NFFT. I got one package dependency error but still the tests ran and everything seemed to run except for one ERROR: LoadError: UndefVarError: libfinufft not defined which I suspect is unrelated to #97. So it seems OK, but I do not have full confidence of my test given the package nesting...

any real-world testing (e.g. in MIRT) would be appreciated.

Yep, I am working on it. In fact this suggestion originated from a user reported issue where the user clearly thought that having the nodes on the GPU would suffice to invoke CUDA (and so did I initially until I looked into it more): https://github.com/JeffFessler/mirt-demo/issues/5

Jun 23 '22 22:06 JeffFessler

One needs to dev NFFT and CuNFFT (and probably also AbstractNFFTs) and then do the testing. But they were commented out. I re-enabled them https://github.com/JuliaMath/NFFT.jl/blob/master/test/runtests.jl#L15 And I also tested on my GPU system (works!) and made a release. So right now CuNFFT should work.

Jun 24 '22 14:06 tknopp

By the way, CuNFFT is not of the same quality as its CPU implementation. We use the sparse matrix trick since it is so simple to bring everything on the GPU. As far as I have seen, this is that fastest GPU implementation: https://github.com/flatironinstitute/cufinufft It would be very interesting how far one can get with a pure Julia code. I have no experience in that direction until now. If there is interest in a competitive CuNFFT.jl implementation we should probably create a dedicated issue and discuss there. Other idea would be to just create wrapper around cufinufft for the moment.

Jun 24 '22 14:06 tknopp

Hello! We've recently made some progress with the points mentioned in this issue.

First of we've moved away from a dedicated CUDA package and instead implemented a GPU-vendor-agnostic NFFT-Plan. This plan is realized as a package extension and does not depend on CUDA. Additionally, this also (partially) works now on AMDGPUs and in general any AbstractGPUArray that implements an FFT. This means upstream packages don't need to specifically load CuNFFT anymore.

In order to implement that I used Adapt.jl to move the data to the given GPU type. That package essentially implements convert for "wrapped" types such as adjoint or a sparse array.

I did not touch the plan_nfft interface yet, so one still has to do either plan_nfft(CuArray, ...) or plan_nfft(ROCArray, ...). One issue is that we can't directly use the result of typeof(x) in adapt. The function wants to get the basetype, so instead of CuArray{Float32, 2, ...} it wants CuArray.

This is a problem we also ran into in an upstream package, where I solved this issue by stripping the parameters from the type: https://github.com/JuliaImageRecon/LinearOperatorCollection.jl/blob/08c3ff7566da68268592f25184337d1001c1e2be/ext/LinearOperatorNFFTExt/NFFTOp.jl#L44

So while there is no public API to strip these parameters atm, we could use this workaround to implement plan_nfft(x, ...)

Jul 04 '24 10:07 nHackel

Oh and we also now have CI for both CUDA and AMDGPUs

Jul 04 '24 10:07 nHackel

@nHackel: yes, it would be good to rethink the interface based on all the experience that you have made in the last time.

Jul 04 '24 13:07 tknopp