MultiFloats.jl icon indicating copy to clipboard operation
MultiFloats.jl copied to clipboard

Some rounding error issue between CPU & GPU

Open haampie opened this issue 3 years ago • 4 comments

I've tried on MultiFloats.jl on the GPU, but I'm getting loss of precision compared to the CPU:

using CUDA, MultiFloats
A = rand(Float64x8, 100, 100)
B = rand(Float64x8, 100, 100)
A * B - Array(CuArray(A) * CuArray(B))

Gives me

100×100 Matrix{MultiFloat{Float64, 8}}:
 -1.23827e-98    9.35263e-99  -8.83181e-99  …  -4.70324e-99   -1.3348e-98
 -1.98421e-99    8.20389e-99   1.67043e-98      1.45499e-98    2.32225e-98
 -2.77264e-99   -3.30951e-99   1.32426e-98     -1.09181e-98    7.84157e-100
  1.92544e-98    6.35776e-99  -8.85547e-99      1.29435e-98   -4.89252e-99
 -5.52038e-99    5.35901e-99  -3.705e-98        1.53947e-99    7.38954e-99
 -2.16904e-98    1.64505e-98  -1.16536e-98  …  -3.19036e-98    7.5397e-99
  6.72487e-98    6.07349e-99  -2.87359e-98      ...

but eps(Float64x8) is 5.9091063153828709e-126.

What explain this? The order of iteration?

haampie avatar Apr 25 '21 15:04 haampie

Thanks for the find @haampie ! This is very mysterious to me. I haven't played with Julia/CUDA interop before, so I'm unsure how Julia code gets compiled for GPU execution. The fact that most of the limbs are accurate is especially puzzling; if it was simply the case that CUDA's arithmetic/fma operations are improperly rounded, then you would expect all of limbs 2-8 to be garbage. But the fact that limbs 1-6 are right, while limbs 7-8 are wrong, rules out all of my easy hypotheses for what could be going wrong.

I'll look into this the next time I work on MultiFloats.jl (which honestly might take a while... grad student life has me swamped these days)

dzhang314 avatar Apr 29 '21 03:04 dzhang314

Just to add another data point. I tried exactly the same code on my GPU (1080ti), I am getting the accuracy 1e-114. Is this the expected accuracy level?

100×100 Matrix{MultiFloat{Float64, 8}}: -1.14468e-115 2.54748e-115 -6.94563e-115 -4.96936e-115 2.13842e-115 8.64325e-115 1.6073e-115 … 3.4182e-115 6.56281e-115 3.55884e-115 4.6424e-115 -1.1596e-115 2.23816e-115 ...

kunyuan avatar Mar 07 '22 12:03 kunyuan

Hey @kunyuan, thanks for your interest in MultiFloats.jl! No, this is not the expected accuracy, and I'm afraid to report I still don't understand what's going on with GPU MultiFloat calculations. I'll report back when I have time to take a look at this in detail.

dzhang314 avatar Mar 08 '22 04:03 dzhang314

Hi @dzhang314, do you know CAMPARY? CAMPARY It is a library whose idea is similar to your Multifloat library but made to work on the GPU. I did a reimplementation in Julia some times ago, on the CPU, so quite similar to your library. If you're interested I can try to take off some dust and compare with Multifloat...

orkolorko avatar Jun 20 '22 07:06 orkolorko