rodent
rodent copied to clipboard
Remove `rv_all` from gpu generated code
GPU kernels get polluted by the rv_all instruction. The instruction will be filtered out only if Thorin is compiled with RV. This might not happen, as we should not expect RV to be available when Rodent is used only for GPU.
The "fix" is quite simple and it would be great to get rid of the code duplication, but that is beyond the task of this simple PR.
We should probably have some kind of generic portable SIMD intrinsics that work regardless of the platform (RV, CUDA, AMDHSA, OpenCL, Shady...)
Yes I agree. We should also include the "fma" instruction to the math builtins (maybe with a fallback for non LLVM, e.g, OpenCL, etc). I think there are more general purpose intrinsic which might be handy on all systems - if a well-behaving fallback can be defined.