ArrayFire.jl icon indicating copy to clipboard operation
ArrayFire.jl copied to clipboard

FFT differences between CPU and ArrayFire (OpenCL, GPU)

Open ViralBShah opened this issue 8 years ago • 20 comments

julia> a = rand(Float32, 100,100);

julia> b = AFArray(a);

julia> fft(a)-fft(b);
ERROR: BoundsError: attempt to access (100,100)
  at index [3]
 in - at arraymath.jl:97

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

Perhaps we need some promotion rules - or at least better error messages, if mixed CPU/GPU operations are not supported.

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

You cannot subtract an Array and an AFArray. I could of course change the default behavior to transferring to the CPU and then subtracting.

ranjanan avatar Jun 09 '16 15:06 ranjanan

Yeah, the error messages could be better I think. I might simply say -(a::Array, b::AFArray) = throw("Can't subtract Arrays and AFArrays")

ranjanan avatar Jun 09 '16 15:06 ranjanan

Capture in docs? Ideally, if someone tries to do these mixed operations, they should get a useful error. It is not enough to add just that one line. You'll need to add this comprehensively for all operations in all positions.

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

I say let's just have a section in the docs in the compute model - that for purposes of performance, we do not automatically move data and it will be an error.

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

Correct, I'd have to do this for all functions. I think I can capture this in the docs.

ranjanan avatar Jun 09 '16 15:06 ranjanan

That's quite a bit of difference, but perhaps one can't do much about it. And I am not even running on a real GPU - just on my mac.

julia> maximum(abs(fft(a)-Array(fft(b))))
0.00029078507f0

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

If you're using the OpenCL backend, that probably has to do with the different in the value returned by clFFT and FFTW. I'm not sure what we can do about those numerical differences. They would also arise in case you're using CuFFT on a GPU too.

ranjanan avatar Jun 09 '16 15:06 ranjanan

Do you think it's worth noting in the README about the differences in the values?

ranjanan avatar Jun 09 '16 15:06 ranjanan

clFFT (which is probably also their CPU backend) is quite likely the culprit here ;) I coincidentally just tried out clFFT directly and got a very similar result: maximum(abs(a - b)) -> 0.000159

SimonDanisch avatar Jun 09 '16 15:06 SimonDanisch

CUDA backend seems to be in the same range:

julia> maximum(abs(af-Array(bf)))
0.0001501969f0

SimonDanisch avatar Jun 09 '16 15:06 SimonDanisch

Perhaps just one line to say that the numerical results may differ from the CPU versions that may sometimes be more accurate. Users should adequately check their programs for correctness, as they always should.

ViralBShah avatar Jun 09 '16 15:06 ViralBShah

I've added notes on both.

ranjanan avatar Jun 09 '16 15:06 ranjanan

how do you know the cpu is more accurate? random rounding errors are random rounding errors if your code depends on this something is wrong

On Thu, Jun 9, 2016 at 11:46 AM, Ranjan Anantharaman < [email protected]> wrote:

I've added notes on both.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaComputing/ArrayFire.jl/issues/36#issuecomment-224937509, or mute the thread https://github.com/notifications/unsubscribe/AA0pQsUY5yPCyYm1aZ2bPHoh80cFUZkiks5qKDVigaJpZM4IyFOk .

alanedelman avatar Jun 09 '16 16:06 alanedelman

FFTs can be notoriously hard to verify numerically because

  1. floating point arithmetic is not associative. This means you can get different results simply by changing the number of threads you use on the CPU as well.
  2. it uses a lot of exponential operations.

~~Any numerical changes caused by (1) will result in larger changes by (2).~~

pavanky avatar Jun 09 '16 16:06 pavanky

Your assertions 1 and 2 are true, your conclusion is not.

alanedelman avatar Jun 09 '16 16:06 alanedelman

Welp. should have worded it differently.

pavanky avatar Jun 09 '16 16:06 pavanky

fft's are not any more difficult to verify numerically than anything else

alanedelman avatar Jun 09 '16 16:06 alanedelman

I mean't hard to verify for correctness in the way @ViralBShah was doing.

Anyway according the published specs, the accuracy for expoentials seems to be the following:

Note this is for single precision

pavanky avatar Jun 09 '16 16:06 pavanky

I was generally thinking like many users probably will - that CPU libraries tend to have slightly higher accuracy with ULPs and such and accelerated libraries will give up some last bits to be faster. Also @stevengj 's FFTW is a pretty good reference, I would have thought.

ViralBShah avatar Jun 09 '16 19:06 ViralBShah