kokkos-kernels
kokkos-kernels copied to clipboard
Wrong results from scal and axpy when input type promotion is involved
This code gives the wrong result for doing a KokkosBlas::scal
and ::axpy
when the input has different type than the output view:
#include <iostream>
#include <Kokkos_Core.hpp>
#include <KokkosBlas1_sum.hpp>
#include <KokkosBlas1_scal.hpp>
#include <KokkosBlas1_axpby.hpp>
using ExecutionSpace = Kokkos::DefaultExecutionSpace;
using MemorySpace = ExecutionSpace::memory_space;
using Device = Kokkos::Device<ExecutionSpace,MemorySpace>;
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv); {
Kokkos::View<float*, Device> a("a", 3), c("c", 3);
Kokkos::View<int*, Device> b("b", 3);
Kokkos::deep_copy(ExecutionSpace{}, b, 4);
const auto s = KokkosBlas::sum(b);
const float n = float(1) / s;
KokkosBlas::scal(a, n, b);
KokkosBlas::axpy(n, b, c);
Kokkos::fence();
auto aH = Kokkos::create_mirror_view_and_copy(Kokkos::DefaultHostExecutionSpace{}, a);
auto cH = Kokkos::create_mirror_view_and_copy(Kokkos::DefaultHostExecutionSpace{}, c);
std::cout << "sum = " << s << " norm = " << n << "\n";
std::cout << " a = \n";
for(int i = 0; i < aH.extent(0); ++i) {
std::cout << aH(i) << "\n";
}
std::cout << " c = \n";
for(int i = 0; i < cH.extent(0); ++i) {
std::cout << cH(i) << "\n";
}
} Kokkos::finalize();
}
Here both a
and c
ends up being zeros unless the type of b
is changed from int
to float
.
@Char-Aznable thanks for reporting this issue, let me tag a few developers here @vqd8a @e10harvey they might have some ideas. We will have a look but maybe not before January.
@Char-Aznable It looks to me in the KokkosBlas::scal
and KokkosBlas::axpy
, the scalar type of a
are internally unified with the value_type of the input view X
(a
is n
and X
is b
, respectively, in your code). When the value_type
of b
is int
, the scalar type of n
is internally set to int
too. So ,there must be a cast from float
to int
which makes n
be zero.
I think we can fix this issue by unifying the scalar type of a
with the value_type of the output view R
(for KokkosBlas::scal
) or with the value_type of the output view Y
(for KokkosBlas::axpy
) instead. But I am not sure if this breaks anything else.
@e10harvey, Any thoughts?
Can we do something like std::is_constructible<R::value_type, decltype(std::declval<LHS::value_type>() * std::declval<RHS::value_type>())>::value
to assert the input output type compatibility so to avoid having to explicitly type cast?
Implicit type casting should always upcast to float
in this case. Where are we explicitly down casting in scal or axpy?
We should set ourselves a standard that we document on the wiki or somewhere else that spells the expected behavior when we are dealing with mixed precision. On top of adding clarity for users it would provide developers a clear guideline on how to handle this.
Implicit type casting should always upcast to
float
in this case. Where are we explicitly down casting in scal or axpy?
@e10harvey , I think they are in here: https://github.com/kokkos/kokkos-kernels/blob/6c786cd6900b977e7b3b19a1f88c0c433a49cbcc/src/blas/KokkosBlas1_scal.hpp#L99-L104 https://github.com/kokkos/kokkos-kernels/blob/6c786cd6900b977e7b3b19a1f88c0c433a49cbcc/src/blas/KokkosBlas1_axpby.hpp#L105-L113