ArrayFire.jl
ArrayFire.jl copied to clipboard
FFT differences between CPU and ArrayFire (OpenCL, GPU)
julia> a = rand(Float32, 100,100);
julia> b = AFArray(a);
julia> fft(a)-fft(b);
ERROR: BoundsError: attempt to access (100,100)
at index [3]
in - at arraymath.jl:97
Perhaps we need some promotion rules - or at least better error messages, if mixed CPU/GPU operations are not supported.
You cannot subtract an Array
and an AFArray
. I could of course change the default behavior to transferring to the CPU and then subtracting.
Yeah, the error messages could be better I think. I might simply say
-(a::Array, b::AFArray) = throw("Can't subtract Arrays and AFArrays")
Capture in docs? Ideally, if someone tries to do these mixed operations, they should get a useful error. It is not enough to add just that one line. You'll need to add this comprehensively for all operations in all positions.
I say let's just have a section in the docs in the compute model - that for purposes of performance, we do not automatically move data and it will be an error.
Correct, I'd have to do this for all functions. I think I can capture this in the docs.
That's quite a bit of difference, but perhaps one can't do much about it. And I am not even running on a real GPU - just on my mac.
julia> maximum(abs(fft(a)-Array(fft(b))))
0.00029078507f0
If you're using the OpenCL backend, that probably has to do with the different in the value returned by clFFT and FFTW. I'm not sure what we can do about those numerical differences. They would also arise in case you're using CuFFT on a GPU too.
Do you think it's worth noting in the README about the differences in the values?
clFFT (which is probably also their CPU backend) is quite likely the culprit here ;)
I coincidentally just tried out clFFT directly and got a very similar result:
maximum(abs(a - b)) -> 0.000159
CUDA backend seems to be in the same range:
julia> maximum(abs(af-Array(bf)))
0.0001501969f0
Perhaps just one line to say that the numerical results may differ from the CPU versions that may sometimes be more accurate. Users should adequately check their programs for correctness, as they always should.
I've added notes on both.
how do you know the cpu is more accurate? random rounding errors are random rounding errors if your code depends on this something is wrong
On Thu, Jun 9, 2016 at 11:46 AM, Ranjan Anantharaman < [email protected]> wrote:
I've added notes on both.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaComputing/ArrayFire.jl/issues/36#issuecomment-224937509, or mute the thread https://github.com/notifications/unsubscribe/AA0pQsUY5yPCyYm1aZ2bPHoh80cFUZkiks5qKDVigaJpZM4IyFOk .
FFTs can be notoriously hard to verify numerically because
- floating point arithmetic is not associative. This means you can get different results simply by changing the number of threads you use on the CPU as well.
- it uses a lot of exponential operations.
~~Any numerical changes caused by (1) will result in larger changes by (2).~~
Your assertions 1 and 2 are true, your conclusion is not.
Welp. should have worded it differently.
fft's are not any more difficult to verify numerically than anything else
I mean't hard to verify for correctness in the way @ViralBShah was doing.
Anyway according the published specs, the accuracy for expoentials seems to be the following:
Note this is for single precision
I was generally thinking like many users probably will - that CPU libraries tend to have slightly higher accuracy with ULPs and such and accelerated libraries will give up some last bits to be faster. Also @stevengj 's FFTW is a pretty good reference, I would have thought.