accfft
accfft copied to clipboard
GPU accfft - gives error for certain decompositions
Hi,
I have downloaded the latest accfft yesterday, was able to install without any errors. The GPU fft on Tesla P100 gives an error in fft computation for certain decompositions. Please see the dump below:
mpirun -np 4 ./step1_gpu 512 512 512 Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 2 , c_dims_1 = 2 L1 Error of iFF(a)-a: 3826.97 Relative L1 Error of iFF(a)-a: 0.0101613 GPU Timing for FFT of size 512512512 Setup 0.865407 FFT 0.246016 IFFT 0.333215
Each MPI process is assigned a different GPU. If I switch to a different decomposition, i.e c_dims[0] = 4 and c_dims[1] = 1, the fft computation is correct (see dump below) mpirun -np 4 ./step1_gpu 512 512 512 c_dims[0] = 4, c_dims_1 = 1 L1 Error is 1.23944e-09 Relative L1 Error is 3.29094e-15 Results are CORRECT! GPU Timing for FFT of size 512512512 Setup 1.79198 FFT 0.142055 IFFT 0.143942
Could you please help in resolving this?
regards, Naga
Try changing line 76 of step1_gpu.cpp from
cudaMalloc((void**) &data2, isize[0] * isize[1] * isize[2] * sizeof(double));
to
cudaMalloc((void**) &data2, alloc_max);
This would fix any issues due to larger temporary space required during transposes.
I wonder if this was also the reason for #15