Valentin Churavy
Valentin Churavy
> Is there a way to have this selected automatically for some use cases, or make it default? Any better strategy than the current one? I was thinking about this,...
Sorry could we keep the conversation focused here? If there is an issue with atomic operations in CUDA.jl please open an issue there. I assume that: > My code speeds...
Can you split the issue into device side RNG and the host side hang? They are unrelated and likely need different work to fix
I don't think removing the artifacts is the right answer. This breaks down-stream JLL being able to build against ROCM and makes it harder for us to understand what version...
In the past I had success with using SIMD.jl for this. (packed ops on CUDA)
One could, but it would require probably quite a bit of work by someone. CUDA.jl WMMA support was the work of a full-time master student.
> wekk it has to be done at some point. why only nvidia to get all the goodies? :P Are you volunteering?
Will need to change https://github.com/JuliaGPU/KernelAbstractions.jl/blob/c5fe83c899b3fd29308564467c3a3722179bfe9d/Project.toml#L23 to only be `0.7.1`
Will need rebase for #478
The tests are a bit sparse and they should be enabled for more than the CPU backend?