MultiFloats.jl
MultiFloats.jl copied to clipboard
Some rounding error issue between CPU & GPU
I've tried on MultiFloats.jl on the GPU, but I'm getting loss of precision compared to the CPU:
using CUDA, MultiFloats
A = rand(Float64x8, 100, 100)
B = rand(Float64x8, 100, 100)
A * B - Array(CuArray(A) * CuArray(B))
Gives me
100×100 Matrix{MultiFloat{Float64, 8}}:
-1.23827e-98 9.35263e-99 -8.83181e-99 … -4.70324e-99 -1.3348e-98
-1.98421e-99 8.20389e-99 1.67043e-98 1.45499e-98 2.32225e-98
-2.77264e-99 -3.30951e-99 1.32426e-98 -1.09181e-98 7.84157e-100
1.92544e-98 6.35776e-99 -8.85547e-99 1.29435e-98 -4.89252e-99
-5.52038e-99 5.35901e-99 -3.705e-98 1.53947e-99 7.38954e-99
-2.16904e-98 1.64505e-98 -1.16536e-98 … -3.19036e-98 7.5397e-99
6.72487e-98 6.07349e-99 -2.87359e-98 ...
but eps(Float64x8)
is 5.9091063153828709e-126
.
What explain this? The order of iteration?
Thanks for the find @haampie ! This is very mysterious to me. I haven't played with Julia/CUDA interop before, so I'm unsure how Julia code gets compiled for GPU execution. The fact that most of the limbs are accurate is especially puzzling; if it was simply the case that CUDA's arithmetic/fma operations are improperly rounded, then you would expect all of limbs 2-8 to be garbage. But the fact that limbs 1-6 are right, while limbs 7-8 are wrong, rules out all of my easy hypotheses for what could be going wrong.
I'll look into this the next time I work on MultiFloats.jl (which honestly might take a while... grad student life has me swamped these days)
Just to add another data point. I tried exactly the same code on my GPU (1080ti), I am getting the accuracy 1e-114. Is this the expected accuracy level?
100×100 Matrix{MultiFloat{Float64, 8}}: -1.14468e-115 2.54748e-115 -6.94563e-115 -4.96936e-115 2.13842e-115 8.64325e-115 1.6073e-115 … 3.4182e-115 6.56281e-115 3.55884e-115 4.6424e-115 -1.1596e-115 2.23816e-115 ...
Hey @kunyuan, thanks for your interest in MultiFloats.jl! No, this is not the expected accuracy, and I'm afraid to report I still don't understand what's going on with GPU MultiFloat
calculations. I'll report back when I have time to take a look at this in detail.
Hi @dzhang314, do you know CAMPARY? CAMPARY It is a library whose idea is similar to your Multifloat library but made to work on the GPU. I did a reimplementation in Julia some times ago, on the CPU, so quite similar to your library. If you're interested I can try to take off some dust and compare with Multifloat...