cufinufft Rapidly degrading adjoint test in 3D

Dear all,

I was testing cufinufft via the Python wrapper on some 3d problems, and I stumbled upon some accuracy issues when performing the adjoint test. That is, given the NFFT linear operator F,

dot(F*x,y) \approx dot(x,F'*y).

The adjoint test passes for very small-sized problems (32,32,32) (rel err around 1e-7 on my machine) but it gets quickly unacceptable for size (128,128,128) (rel err around 0.5, worst case). You can find below the code I was running.

Please let me know if this is somehow expected, or if it can be mitigated somehow.

Regards, Gabrio

# Module import
import numpy as np
import pycuda.autoinit
from pycuda.gpuarray import GPUArray, to_gpu
from cufinufft import cufinufft

# NFFT eval
def cufinufft3d_eval(u, kx, ky, kz, eps=1e-6):
     
     # Allocate memory on the GPU for output variable
     n_transf = 1
     U_gpu = GPUArray((n_transf, len(kx)), dtype=u.dtype)
     
     # Initialize the plan and set the points
     plan = cufinufft(2, u.shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))
     
     # Execute the plan, reading from the array u and storing the result in U_gpu
     plan.execute(U_gpu, to_gpu(u))
     
     # Retreive the result from the GPU
     return U_gpu.get()[0,:]

def cufinufft3d_adjeval(U, kx, ky, kz, shape, eps=1e-6):
     
     # Allocate memory on the GPU for output variable
     n_transf = 1
     u_gpu = GPUArray((n_transf, shape[0], shape[1], shape[2]), dtype=U.dtype)
     
     # Initialize the plan and set the points
     plan = cufinufft(1, shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))
     
     # Execute the plan, reading from the array U and storing the result in u_gpu
     plan.execute(to_gpu(U), u_gpu)
     
     # Retreive the result from the GPU
     return u_gpu.get()[0,:,:,:]

# 3D settings
# shape = (32,32,32)
# shape = (64,64,64)
shape = (128,128,128)
eps = 1e-7
u = np.random.randn(*shape).astype(np.complex64)+1j*np.random.randn(*shape).astype(np.complex64)
M = 10
kx = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
ky = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
kz = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)

# Adjoint test
U  = cufinufft3d_eval(u, kx.flatten(), ky.flatten(), kz.flatten(), eps=eps)
V = np.random.randn(*U.shape).astype(np.complex64)+1j*np.random.randn(*U.shape).astype(np.complex64)
v = cufinufft3d_adjeval(V, kx.flatten(), ky.flatten(), kz.flatten(), shape, eps=eps)
a = np.vdot(U.flatten(),V.flatten())
b = np.vdot(u.flatten(),v.flatten())
np.linalg.norm(a-b)/np.linalg.norm(a) # rel err

Jan 14 '22 15:01 grizzuti

Thanks for bringing this up. Have you observed this with other NUFFT implementations, such as Finufft?

Will see if I can reproduce this.

Feb 04 '22 10:02 janden

So I'm seeing errors of order one for all shapes when I run this locally. Will try on another machine. I can't see any real issue with your script though, so it's not clear to me what could be happening.

Feb 04 '22 11:02 janden

Hi Joakim, I did the same experiment with the cpu implementation of finufft; no issues there! Also, gpu 2D is fine. On my machine, inaccuracies are more pronounced for larger 3D experiments.

Feb 04 '22 11:02 grizzuti

So the problem seems to be hardware dependent. On my laptop (GeForce 940MX), all three sizes (32, 64, and 128) give an error of order one. Same when I run this on a P100. However, when I run on the V100/A100, I get an error of order 1e-7, 1e-6, and 1e-5, respectively.

My first guess was that this is memory related, but the P100 and V100 both have 16 GB of memory, so that's not it. Will dig deeper into this.

Feb 04 '22 16:02 janden