some difference of example 15 is big running on amd gpu using opencl
15 - VkFFT / FFTW R2C+C2R precision test in single precision VkFFT System: 1024x16384x1 avg_difference: 2.38e+06 max_difference: 2.47e+07 avg_eps: 2.01e+00 max_eps: 3.90e+06 VkFFT System: 32x1x1 avg_difference: 2.86e+07 max_difference: 6.58e+07 avg_eps: 4.85e+06 max_eps: 6.77e+07 VkFFT System: 64x1x1 avg_difference: 4.90e+07 max_difference: 1.49e+08 avg_eps: 3.07e+06 max_eps: 3.32e+07 VkFFT System: 128x1x1 avg_difference: 6.20e+07 max_difference: 2.72e+08 avg_eps: 2.61e+06 max_eps: 6.45e+07 VkFFT System: 256x1x1 avg_difference: 8.24e+07 max_difference: 3.54e+08 avg_eps: 4.17e+06 max_eps: 4.93e+08 VkFFT System: 512x1x1 avg_difference: 1.37e+08 max_difference: 6.94e+08 avg_eps: 1.38e+06 max_eps: 7.76e+07 VkFFT System: 1024x1x1 avg_difference: 2.10e+08 max_difference: 9.52e+08 avg_eps: 3.29e+06 max_eps: 1.04e+09 VkFFT System: 2048x1x1 avg_difference: 2.66e+08 max_difference: 1.47e+09 avg_eps: 1.75e+06 max_eps: 8.25e+08 VkFFT System: 4096x1x1 avg_difference: 2.61e+08 max_difference: 1.56e+09 avg_eps: 1.15e+06 max_eps: 1.48e+09 VkFFT System: 8192x1x1 avg_difference: 1.27e-03 max_difference: 6.35e-03 avg_eps: 1.67e-06 max_eps: 2.06e-03 VkFFT System: 16384x1x1 avg_difference: 2.47e-03 max_difference: 1.27e-02 avg_eps: 1.23e-06 max_eps: 1.06e-03 VkFFT System: 32768x1x1 avg_difference: 5.11e-03 max_difference: 2.73e-02 avg_eps: 1.63e-06 max_eps: 4.26e-03 VkFFT System: 65536x1x1 avg_difference: 1.10e-02 max_difference: 6.25e-02 avg_eps: 1.49e-06 max_eps: 1.94e-03 VkFFT System: 131072x1x1 avg_difference: 2.31e-02 max_difference: 1.41e-01 avg_eps: 2.35e-06 max_eps: 4.58e-02 VkFFT System: 262144x1x1 avg_difference: 4.82e-02 max_difference: 3.12e-01 avg_eps: 1.99e-06 max_eps: 3.61e-02 VkFFT System: 524288x1x1 avg_difference: 1.01e-01 max_difference: 5.94e-01 avg_eps: 2.70e-06 max_eps: 2.73e-01 VkFFT System: 1048576x1x1 avg_difference: 2.10e-01 max_difference: 1.31e+00 avg_eps: 2.69e-06 max_eps: 2.94e-01 VkFFT System: 2097152x1x1 avg_difference: 4.35e-01 max_difference: 3.00e+00 avg_eps: 4.29e-06 max_eps: 3.57e+00 VkFFT System: 4194304x1x1 avg_difference: 8.88e-01 max_difference: 6.25e+00 avg_eps: 3.90e-06 max_eps: 3.00e+00 VkFFT System: 8388608x1x1 avg_difference: 1.83e+00 max_difference: 1.20e+01 avg_eps: 4.70e-06 max_eps: 5.67e+00 VkFFT System: 16777216x1x1 avg_difference: 3.77e+00 max_difference: 2.60e+01 avg_eps: 3.24e-06 max_eps: 4.40e+00 VkFFT System: 33554432x1x1 avg_difference: 7.55e+00 max_difference: 5.80e+01 avg_eps: inf max_eps: inf VkFFT System: 67108864x1x1 avg_difference: 1.22e+01 max_difference: 1.24e+02 avg_eps: inf max_eps: inf VkFFT System: 8x8x1 avg_difference: 1.16e+07 max_difference: 1.27e+08 avg_eps: 1.20e+06 max_eps: 2.67e+07 VkFFT System: 8x16x1 avg_difference: 1.34e+07 max_difference: 2.29e+08 avg_eps: 3.49e+06 max_eps: 3.93e+08 VkFFT System: 8x32x1 avg_difference: 2.12e-05 max_difference: 7.63e-05 avg_eps: 5.70e-07 max_eps: 2.82e-05 VkFFT System: 8x64x1 avg_difference: 4.11e-05 max_difference: 1.68e-04 avg_eps: 1.07e-06 max_eps: 3.13e-04 VkFFT System: 8x128x1 avg_difference: 9.98e-05 max_difference: 5.49e-04 avg_eps: 7.59e-07 max_eps: 1.24e-04 VkFFT System: 8x256x1 avg_difference: 2.19e-04 max_difference: 9.77e-04 avg_eps: 1.01e-06 max_eps: 6.93e-04 VkFFT System: 8x512x1 avg_difference: 4.63e-04 max_difference: 2.20e-03 avg_eps: 9.78e-07 max_eps: 9.29e-04 VkFFT System: 8x1024x1 avg_difference: 9.94e-04 max_difference: 5.86e-03 avg_eps: 1.05e-06 max_eps: 8.16e-04 VkFFT System: 8x2048x1 avg_difference: 2.13e-03 max_difference: 1.17e-02 avg_eps: 1.31e-06 max_eps: 3.44e-03 VkFFT System: 8x4096x1 avg_difference: 4.47e-03 max_difference: 2.59e-02 avg_eps: 1.61e-06 max_eps: 7.12e-03 VkFFT System: 8x8192x1 avg_difference: 9.60e-03 max_difference: 5.66e-02 avg_eps: 2.65e-06 max_eps: 4.05e-02 VkFFT System: 8x16384x1 avg_difference: 2.03e-02 max_difference: 1.25e-01 avg_eps: 2.19e-06 max_eps: 7.95e-02 VkFFT System: 8x32768x1 avg_difference: 4.30e-02 max_difference: 2.93e-01 avg_eps: 2.12e-06 max_eps: 8.26e-02 VkFFT System: 8x65536x1 avg_difference: 8.86e-02 max_difference: 6.25e-01 avg_eps: 2.42e-06 max_eps: 1.30e-01 VkFFT System: 8x131072x1 avg_difference: 1.87e-01 max_difference: 1.25e+00 avg_eps: 2.60e-06 max_eps: 2.53e-01 VkFFT System: 8x262144x1 avg_difference: 3.95e-01 max_difference: 2.38e+00 avg_eps: 3.30e-06 max_eps: 2.19e+00 VkFFT System: 8x524288x1 avg_difference: 8.25e-01 max_difference: 5.75e+00 avg_eps: 2.31e-06 max_eps: 3.36e-01 VkFFT System: 8x1048576x1 avg_difference: 1.69e+00 max_difference: 1.20e+01 avg_eps: 2.59e-06 max_eps: 7.78e-01 VkFFT System: 8x2097152x1 avg_difference: 3.36e+00 max_difference: 2.41e+01 avg_eps: 2.81e-06 max_eps: 1.07e+00 VkFFT System: 8x4194304x1 avg_difference: 6.51e+00 max_difference: 5.00e+01 avg_eps: 3.35e-06 max_eps: 1.68e+01 VkFFT System: 8x8388608x1 avg_difference: 1.01e+01 max_difference: 1.04e+02 avg_eps: inf max_eps: inf VkFFT System: 8x16777216x1 avg_difference: 1.60e+01 max_difference: 2.32e+02 avg_eps: inf max_eps: inf VkFFT System: 8x8x1 avg_difference: 2.29e+07 max_difference: 3.94e+08 avg_eps: 1.67e+06 max_eps: 4.55e+07 VkFFT System: 16x16x1 avg_difference: 2.39e-05 max_difference: 9.16e-05 avg_eps: 1.13e-06 max_eps: 1.67e-04 VkFFT System: 32x32x1 avg_difference: 1.01e-04 max_difference: 4.88e-04 avg_eps: 1.13e-06 max_eps: 3.54e-04 VkFFT System: 64x64x1 avg_difference: 4.63e-04 max_difference: 2.44e-03 avg_eps: 1.51e-06 max_eps: 1.45e-03 VkFFT System: 128x128x1 avg_difference: 1.08e+03 max_difference: 3.13e+04 avg_eps: 2.47e-01 max_eps: 3.41e+02 VkFFT System: 256x256x1 avg_difference: nan max_difference: 0.00e+00 avg_eps: nan max_eps: 0.00e+00 VkFFT System: 512x512x1 avg_difference: nan max_difference: 0.00e+00 avg_eps: nan max_eps: 0.00e+00 VkFFT System: 1024x1024x1 avg_difference: 1.49e+05 max_difference: 1.48e+06 avg_eps: 1.50e+00 max_eps: 6.56e+04 VkFFT System: 2048x2048x1 avg_difference: 7.75e-01 max_difference: 5.50e+00 avg_eps: 2.62e-06 max_eps: 5.43e-01 VkFFT System: 4096x4096x1 avg_difference: -nan max_difference: inf avg_eps: -nan max_eps: inf VkFFT System: 8192x8192x1 avg_difference: 1.05e+01 max_difference: 1.12e+02 avg_eps: inf max_eps: inf
Hello,
Something is off. While it is true that at times this test gives high error values for big systems - they come from the randomness of input data, some of the values are close to 0 and hit the dynamic range of single precision. You can check this by printing both the FFTW and VkFFT values uncommenting line 389.
However, this should not be the case for systems like 32, 64 etc. And I can't reproduce the error on Radeon VII I have. You have modified the first value of the test - can you send the full file containing the benchmark 15 you use? Also, I will need to know the GPU you use, ./Vulkan_FFT -devices output and if you made other modifications to the code.
Best regards, Dmitrii
Hello, Thank you! My devices info like this: ly@ly-PC:~/lzx/VkFFT/build$ ./Vulkan_FFT -devices Platform id: 0 Device id: 0 name: AMD VERDE (DRM 2.50.0, 5.4.18-19-generic, LLVM 9.0.1) API:OpenCL 1.1 Mesa 20.0.8
And this is the benchmark 15. I only add this first case and change fftw_ to fttwf_ since linking the double persicion fftw library failed in my runtime. sample_15_precision_VkFFT_single_r2c.txt
Best regards, Zoghin
Hello,
So this is quite an old GPU, so I will need more results to investigate the issue. Can you run the small systems and print kernels and results? I attach the sample 15 script.
sample_15_precision_VkFFT_single_r2c.txt
Best regards, Dmitrii
There were missing floating-point suffix literals in v1.2.30 which was fixed in v1.2.31. Other than that identical code runs on modern GPUs, so I am not sure how to debug this. Can you try v1.2.31 to see if the problem is related to these suffix literals first?
Best regards, Dmitrii