kokkos-kernels
kokkos-kernels copied to clipboard
Inconsistent result of `KokkosBlas::nrm2` for `CudaSpace` and `CudaUVMSpace`
Using the following code snippet leads to inconsistent result of KokkosBlas::nrm2 with Kokkos::CudaSpace and Kokkos::CudaUVMSpace.
It seems that:
- when the view is in
Kokkos::CudaUVMSpacememory space,KokkosBlas::nrm2branches on theKokkosBlas's functor, - when the view is in
Kokkos::CudaSpacememory space, it usescublasDnrm2.
The results are different because KokkosBlas's functor implements the "naïve" approach (might be related to #1076) while the one of cublas uses a multiphase model of accumulation to avoid intermediate underflow and overflow (https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-nrm2).
Here is the snippet:
using scalar_t = double;
using mem_space_A = Kokkos::CudaSpace;
using mem_space_B = Kokkos::CudaUVMSpace;
using view_t_A = Kokkos::View<scalar_t*,mem_space_A>;
using view_t_B = Kokkos::View<scalar_t*,mem_space_B>;
view_t_A a("view without UVM",10000); Kokkos::deep_copy(a,1.0);
view_t_B b("view with UVM",10000); Kokkos::deep_copy(b,1.0);
scalar_t a_nrm2 = KokkosBlas::nrm2(a);
scalar_t b_nrm2 = KokkosBlas::nrm2(b);
scalar_t res_a, res_b;
cublasHandle_t handle;
cublasCreate(&handle);
cublasDnrm2(handle,a.size(),a.data(),1,&res_a);
cublasDnrm2(handle,b.size(),b.data(),1,&res_b);
std::cout << std::setprecision(25) << std::scientific;
std::cout << "> Nrm2 of a with KokkosBlas : " << a_nrm2 << std::endl;
std::cout << "> Nrm2 of b with KokkosBlas : " << b_nrm2 << std::endl;
std::cout << "> Nrm2 of a with cublasDnrm2: " << res_a << std::endl;
std::cout << "> Nrm2 of b with cublasDnrm2: " << res_b << std::endl;
that gives:
> Nrm2 of a with KokkosBlas : 1.0000000000000001421085472e+02
> Nrm2 of b with KokkosBlas : 1.0000000000000000000000000e+02
> Nrm2 of a with cublasDnrm2: 1.0000000000000001421085472e+02
> Nrm2 of b with cublasDnrm2: 1.0000000000000001421085472e+02