ArrayFire.jl icon indicating copy to clipboard operation
ArrayFire.jl copied to clipboard

mean and var test failing sometimes

Open mauro3 opened this issue 9 years ago • 8 comments

Most of the time (although not always) the mean and var tests fail:

...v0.4/ArrayFire/test(master)  >> julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-unknown-linux-gnu

julia> include("runtests.jl")
Device[0] has no support for OpenGL Interoperation
ERROR: LoadError: test failed: 0.090069480240345 == 0.090069495f0
 in expression: var(ad) == var(a)
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 47

julia> include("runtests.jl")
ERROR: LoadError: test failed: 0.5107571f0 == 0.51075715f0
 in expression: mean(ad) == mean(a)
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 42

Also note that it is a bit strange (wrong?) that var returns a double.

I'm running on the built-in Intel HD graphics using Beignet but it also happens using the CPU backend.

mauro3 avatar Jun 10 '16 13:06 mauro3

You're right, var should return a Float32 if it's a Float32 Array. I'll fix that.

ranjanan avatar Jun 10 '16 13:06 ranjanan

My tests haven't ever failed though. This is quite strange: does it have something to do with Device[0] has no support for OpenGL Interoperation ? I've never seen that message come up on my system either.

That's a Beignet issue I guess:

OpenGL-OpenCL interop via cl_khr_gl_sharing is not supported

ranjanan avatar Jun 11 '16 10:06 ranjanan

Looks like a floating point inaccuracy. And they are intermittent on my computer. Is it guaranteed that the floats get summed in the same order at all times? I suspect not if it's done in parallel. So, why not use @test_approx_eq?

mauro3 avatar Jun 11 '16 12:06 mauro3

Probably. I haven't used Beignet before, so if those floating point errors are possible on that platform, I suppose @test_approx_eq makes sense.

Does that work for your platform?

ranjanan avatar Jun 11 '16 12:06 ranjanan

Yes it works. But running the tests many times gives some other odd errors, such as:

Device[0] has no support for OpenGL Interoperation
ERROR: LoadError: test failed: 5.420249f0 < 0.0001
 in expression: sumabs(Array(ud) - u) < 0.0001
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in process_options at ./client.jl:280
 in _start at ./client.jl:378
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 60

above one is particularly bad as 5.420249f0 <<<< 0.0001.

More usually it's something like this (about once in 30 test-runs):

ERROR: LoadError: test failed: 1.0910457f-5 < 1.0e-5
 in expression: sumabs2(Array(chol(ad * ad')) - chol(a * a')) < 1.0e-5
 in error at ./error.jl:21
 in default_handler at test.jl:28
 in do_test at test.jl:53
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 [inlined code] from none:3
 in anonymous at no file:0
while loading /home/mauro/.julia/v0.4/ArrayFire/test/runtests.jl, in expression starting on line 56

Do you want to try to track this down?

mauro3 avatar Jun 11 '16 19:06 mauro3

~~This is likely a problem in arrayfire or beignet.~~

Nvm, if it is happening on cpu, then it might be something different.

pavanky avatar Jun 11 '16 20:06 pavanky

If you are seeing accuracy problems on Beignet, it may be because of this:

Precision issue.
Currently Gen does not provide native support of high precision math functions
required by OpenCL. We provide a software version to achieve high precision, 
which you can turn on through

# export OCL_STRICT_CONFORMANCE=1.

But be careful, this would make your CL kernel run a little longer.

Source

pavanky avatar Jun 11 '16 20:06 pavanky

I'll check whether running with CPU works and whether that beignet trick works. Tnx.

mauro3 avatar Jun 11 '16 21:06 mauro3