ArrayFire.jl FFT differences between CPU and ArrayFire (OpenCL, GPU)

FFT differences between CPU and ArrayFire (OpenCL, GPU)

Open ViralBShah opened this issue 8 years ago • 20 comments

julia> a = rand(Float32, 100,100);

julia> b = AFArray(a);

julia> fft(a)-fft(b);
ERROR: BoundsError: attempt to access (100,100)
  at index [3]
 in - at arraymath.jl:97

Jun 09 '16 15:06 ViralBShah

Perhaps we need some promotion rules - or at least better error messages, if mixed CPU/GPU operations are not supported.

Jun 09 '16 15:06 ViralBShah

You cannot subtract an Array and an AFArray. I could of course change the default behavior to transferring to the CPU and then subtracting.

Jun 09 '16 15:06 ranjanan

Yeah, the error messages could be better I think. I might simply say -(a::Array, b::AFArray) = throw("Can't subtract Arrays and AFArrays")

Jun 09 '16 15:06 ranjanan

Capture in docs? Ideally, if someone tries to do these mixed operations, they should get a useful error. It is not enough to add just that one line. You'll need to add this comprehensively for all operations in all positions.

Jun 09 '16 15:06 ViralBShah

I say let's just have a section in the docs in the compute model - that for purposes of performance, we do not automatically move data and it will be an error.

Jun 09 '16 15:06 ViralBShah

Correct, I'd have to do this for all functions. I think I can capture this in the docs.

Jun 09 '16 15:06 ranjanan

That's quite a bit of difference, but perhaps one can't do much about it. And I am not even running on a real GPU - just on my mac.

julia> maximum(abs(fft(a)-Array(fft(b))))
0.00029078507f0

Jun 09 '16 15:06 ViralBShah

If you're using the OpenCL backend, that probably has to do with the different in the value returned by clFFT and FFTW. I'm not sure what we can do about those numerical differences. They would also arise in case you're using CuFFT on a GPU too.

Jun 09 '16 15:06 ranjanan

Do you think it's worth noting in the README about the differences in the values?

Jun 09 '16 15:06 ranjanan

clFFT (which is probably also their CPU backend) is quite likely the culprit here ;) I coincidentally just tried out clFFT directly and got a very similar result: maximum(abs(a - b)) -> 0.000159

Jun 09 '16 15:06 SimonDanisch

CUDA backend seems to be in the same range:

julia> maximum(abs(af-Array(bf)))
0.0001501969f0

Jun 09 '16 15:06 SimonDanisch

Perhaps just one line to say that the numerical results may differ from the CPU versions that may sometimes be more accurate. Users should adequately check their programs for correctness, as they always should.

Jun 09 '16 15:06 ViralBShah

I've added notes on both.

Jun 09 '16 15:06 ranjanan

how do you know the cpu is more accurate? random rounding errors are random rounding errors if your code depends on this something is wrong

On Thu, Jun 9, 2016 at 11:46 AM, Ranjan Anantharaman < [email protected]> wrote:

I've added notes on both.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaComputing/ArrayFire.jl/issues/36#issuecomment-224937509, or mute the thread https://github.com/notifications/unsubscribe/AA0pQsUY5yPCyYm1aZ2bPHoh80cFUZkiks5qKDVigaJpZM4IyFOk .

Jun 09 '16 16:06 alanedelman

FFTs can be notoriously hard to verify numerically because

floating point arithmetic is not associative. This means you can get different results simply by changing the number of threads you use on the CPU as well.
it uses a lot of exponential operations.

~~Any numerical changes caused by (1) will result in larger changes by (2).~~

Jun 09 '16 16:06 pavanky

Your assertions 1 and 2 are true, your conclusion is not.

Jun 09 '16 16:06 alanedelman

Welp. should have worded it differently.

Jun 09 '16 16:06 pavanky

fft's are not any more difficult to verify numerically than anything else

Jun 09 '16 16:06 alanedelman

I mean't hard to verify for correctness in the way @ViralBShah was doing.

Anyway according the published specs, the accuracy for expoentials seems to be the following:

Note this is for single precision

Jun 09 '16 16:06 pavanky

I was generally thinking like many users probably will - that CPU libraries tend to have slightly higher accuracy with ULPs and such and accelerated libraries will give up some last bits to be faster. Also @stevengj 's FFTW is a pretty good reference, I would have thought.

Jun 09 '16 19:06 ViralBShah

ArrayFire.jl ArrayFire.jl copied to clipboard

FFT differences between CPU and ArrayFire (OpenCL, GPU)

ArrayFire.jl
ArrayFire.jl copied to clipboard