accfft icon indicating copy to clipboard operation
accfft copied to clipboard

GPU accfft - gives error for certain decompositions

Open vydyanat opened this issue 6 years ago • 2 comments

Hi,

I have downloaded the latest accfft yesterday, was able to install without any errors. The GPU fft on Tesla P100 gives an error in fft computation for certain decompositions. Please see the dump below:

mpirun -np 4 ./step1_gpu 512 512 512 Input c_dim[0] * c_dims[1] != nprocs. Automatically switching to c_dims[0] = 2 , c_dims_1 = 2 L1 Error of iFF(a)-a: 3826.97 Relative L1 Error of iFF(a)-a: 0.0101613 GPU Timing for FFT of size 512512512 Setup 0.865407 FFT 0.246016 IFFT 0.333215

Each MPI process is assigned a different GPU. If I switch to a different decomposition, i.e c_dims[0] = 4 and c_dims[1] = 1, the fft computation is correct (see dump below) mpirun -np 4 ./step1_gpu 512 512 512 c_dims[0] = 4, c_dims_1 = 1 L1 Error is 1.23944e-09 Relative L1 Error is 3.29094e-15 Results are CORRECT! GPU Timing for FFT of size 512512512 Setup 1.79198 FFT 0.142055 IFFT 0.143942

Could you please help in resolving this?

regards, Naga

vydyanat avatar Aug 21 '18 10:08 vydyanat

Try changing line 76 of step1_gpu.cpp from cudaMalloc((void**) &data2, isize[0] * isize[1] * isize[2] * sizeof(double)); to cudaMalloc((void**) &data2, alloc_max); This would fix any issues due to larger temporary space required during transposes.

frobnitzem avatar Aug 15 '19 15:08 frobnitzem

I wonder if this was also the reason for #15

pgrete avatar Sep 07 '19 16:09 pgrete