Inconsistent result of `KokkosBlas::nrm2` for `CudaSpace` and `CudaUVMSpace`

Open romintomasetti opened this issue 3 years ago • 0 comments

Using the following code snippet leads to inconsistent result of KokkosBlas::nrm2 with Kokkos::CudaSpace and Kokkos::CudaUVMSpace.

It seems that:

when the view is in Kokkos::CudaUVMSpace memory space, KokkosBlas::nrm2 branches on the KokkosBlas's functor,
when the view is in Kokkos::CudaSpace memory space, it uses cublasDnrm2.

The results are different because KokkosBlas's functor implements the "naïve" approach (might be related to #1076) while the one of cublas uses a multiphase model of accumulation to avoid intermediate underflow and overflow (https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-nrm2).

Here is the snippet:

using scalar_t = double;
using mem_space_A = Kokkos::CudaSpace;
using mem_space_B = Kokkos::CudaUVMSpace;

using view_t_A = Kokkos::View<scalar_t*,mem_space_A>;
using view_t_B = Kokkos::View<scalar_t*,mem_space_B>;

view_t_A a("view without UVM",10000); Kokkos::deep_copy(a,1.0);
view_t_B b("view with    UVM",10000); Kokkos::deep_copy(b,1.0);

scalar_t a_nrm2 = KokkosBlas::nrm2(a);
scalar_t b_nrm2 = KokkosBlas::nrm2(b);

scalar_t res_a, res_b;
cublasHandle_t handle;
cublasCreate(&handle);
cublasDnrm2(handle,a.size(),a.data(),1,&res_a);
cublasDnrm2(handle,b.size(),b.data(),1,&res_b);

std::cout << std::setprecision(25) << std::scientific;
std::cout << "> Nrm2 of a with KokkosBlas : " << a_nrm2 << std::endl;
std::cout << "> Nrm2 of b with KokkosBlas : " << b_nrm2 << std::endl;
std::cout << "> Nrm2 of a with cublasDnrm2: " << res_a  << std::endl;
std::cout << "> Nrm2 of b with cublasDnrm2: " << res_b  << std::endl;

that gives:

> Nrm2 of a with KokkosBlas : 1.0000000000000001421085472e+02
> Nrm2 of b with KokkosBlas : 1.0000000000000000000000000e+02
> Nrm2 of a with cublasDnrm2: 1.0000000000000001421085472e+02
> Nrm2 of b with cublasDnrm2: 1.0000000000000001421085472e+02

Jul 14 '22 08:07 romintomasetti