Valentin Churavy comments

Results 1415 comments of


                                            Valentin Churavy

@atomic is slow within AMDGPU.jl

> Is there a way to have this selected automatically for some use cases, or make it default? Any better strategy than the current one? I was thinking about this,...

@atomic is slow within AMDGPU.jl

Sorry could we keep the conversation focused here? If there is an issue with atomic operations in CUDA.jl please open an issue there. I assume that: > My code speeds...

Guidance on randn hanging on MI250X

Can you split the issue into device side RNG and the host side hang? They are unrelated and likely need different work to fix

Update ROCm artifacts to newer version

I don't think removing the artifacts is the right answer. This breaks down-stream JLL being able to build against ROCM and makes it harder for us to understand what version...

Support packed FP16 operations

In the past I had success with using SIMD.jl for this. (packed ops on CUDA)

rocWMMA support

One could, but it would require probably quite a bit of work by someone. CUDA.jl WMMA support was the work of a full-time master student.

rocWMMA support

> wekk it has to be done at some point. why only nvidia to get all the goodies? :P Are you volunteering?

Add GPU reverse mode to EnzymeExt

Will need to change https://github.com/JuliaGPU/KernelAbstractions.jl/blob/c5fe83c899b3fd29308564467c3a3722179bfe9d/Project.toml#L23 to only be `0.7.1`

Add GPU reverse mode to EnzymeExt

Will need rebase for #478

Add GPU reverse mode to EnzymeExt

The tests are a bit sparse and they should be enabled for more than the CPU backend?