AMDGPU.jl
AMDGPU.jl copied to clipboard
AMD GPU (ROCm) programming in Julia
I have noticed that with KernelAbstractions the use of @atomic is very slow on AMDGPU compared to CUDA. I have a test example at https://github.com/CliMA/CGDycore.jl/tree/main/TestAMD with results for a kernel...
Could you implement rocWMMA to use with Navi3 GPU? From what i understood it uses the AI accelerotors present in them for faster matrix multilplication. I guess this could make...
In a recent [study](https://dl.acm.org/doi/10.1145/3624062.3624278) on Frontier, a 7-point stencil kernel under performs at half the bandwidth (~300 GB/s) of its HIP counterpart (~600 GB/s) on a single MI250x. The behavior...
Debug HIP build triggers following assertion: ```julia julia> using AMDGPU julia> ROCArray{Float32}(undef, 4) julia: /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:310: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base...
```julia /home/gabrielbaraldi/julia4/src/llvm-late-gc-lowering.cpp:1029: void NoteDef(State&, BBState&, int, const std::vector&): Assertion `Num >= 0' failed. [132189] signal (6.-6): Aborted in expression starting at /home/gabrielbaraldi/.julia/dev/AMDGPU/test/ka_tests.jl:17 pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) raise at /lib/x86_64-linux-gnu/libc.so.6...
Ok, I'm going to be honest, I cannot create a MWE for this, but I can provide a replicator. I'm not expecting this issue to be fixed soon, but I...
Hello, just a quick report and temporary solution. I was getting this issue on Arch: ``` julia> using AMDGPU ┌ Warning: HSA runtime is unavailable, compilation and runtime functionality will...
As described in . The main benefits would be enabling persistent on-disk caching for users while removing the somewhat tricky algorithm cache handling code on the Julia side. The main...
Adapt to https://github.com/JuliaGPU/KernelAbstractions.jl/pull/422
When we will be able to use RDNA 3 GPUs with this library it will utilize AI accelerators too ? Or it will need an update