blas(axpby): execution space instance semantics not honored
Using KokkosBlas::axpby, I could easily get a broken code if both following conditions are met:
- use rank-0 views for the coefficients
- pass an execution space instance on which these coefficients are being modified by some preceding kernel
The issue is clear if looking e.g. at the following line: https://github.com/kokkos/kokkos-kernels/blob/b3a4bdf6973dceed7715c2fdc5f9499af54af2d8/blas/src/KokkosBlas1_axpby.hpp#L109
Adding an exec_space.fence() before fetching the value of the rank-0 view fixes it.
As a side node, I think you get potentially other issues, e.g. here
https://github.com/kokkos/kokkos-kernels/blob/b3a4bdf6973dceed7715c2fdc5f9499af54af2d8/blas/src/KokkosBlas1_axpby.hpp#L137
where passing exec_space to Kokkos::deep_copy seems necessary.
Point 2 is definitely a problem.
For point 1, @romintomasetti the block that 109 is in should only be entered if the coefficients are actual scalars, not rank-0 views. Could you please clarify the issue you're seeing?