Valentin Churavy

Results 1413 comments of Valentin Churavy

> What will the low-level interface look like? Much more like programming OpenCL. ``` import KernelIntrinsics function vadd(a, b, c) i = KernelIntrinsics.get_global_id() @inbounds c[i] = a[i] + b[i] return...

Since you don't use global indices you should be able to add unsafe_indicies

This is tricky to fix. IIRC one needs to perform a `cas` loop for < 4 byte atomics. Atomix should use https://github.com/JuliaConcurrent/Atomix.jl/blob/e60c518e3ffd2c9d4e96104f16f2a970a69e4289/lib/AtomixCUDA/src/AtomixCUDA.jl#L38 Which does claim to support Float16: https://github.com/JuliaGPU/CUDA.jl/blob/14de0097ff7c26932cc4a175840961cc7d3f396e/src/device/intrinsics/atomics.jl#L195 What...

It might be that we end up in https://github.com/JuliaConcurrent/UnsafeAtomicsLLVM.jl instead of UnsafeAtomicsCUDA.jl

What happens when you load `AtomixCUDA`?

I won't be able to look at this in detail until August. For now I would recommend just writing a CUDA.jl kernel and using `CUDA.@atomic`

Fairly late to the game here, but I would find something like this useful. I do wonder if we need the code-duplication and if we maybe could use dispatch to...

This works when switching to ROCM 5.4 ``` FROM rocm/rocm-terminal:5.4 RUN sudo apt-get update && \ sudo apt-get dist-upgrade -y && \ sudo apt-get install -y wget vim RUN wget...

```julia-repl julia-repl> @show code_typed(runtime_mixed_call, Tuple{Val{(false, true)}, typeof(threading_run), Ref{typeof(m)}}) 1-element Vector{Any}: CodeInfo( 1 ─ %1 = (isa)(arg_1, Base.RefValue{var"#inner#29"{Vector{Float64}, Float64, UnitRange{Int64}}})::Bool └── goto #3 if not %1 2 ─ %3 = π...