KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Heterogeneous programming in Julia
Ok, maybe I'm just tired, but... ``` using AMDGPU using ROCKernels using KernelAbstractions @kernel function f_test_kernel!() @print(1, '\n') end function f_test!(AT; numcores = 4, numthreads = 256) if AT ==...
Hi, thank you for developing this library - I would like to write optimised kernels for common GPU algorithms such as reduce, scan, radix sort, etc. similar to [CUB](https://nvlabs.github.io/cub/) but...
The internal APO in the docs includes `KernelAbstractions.partition` https://github.com/JuliaGPU/KernelAbstractions.jl/blob/2c67ba27578d2f3c4e38722fe103043e7ae5f442/docs/src/api.md#L23 but there is no docstring https://github.com/JuliaGPU/KernelAbstractions.jl/blob/2c67ba27578d2f3c4e38722fe103043e7ae5f442/src/KernelAbstractions.jl#L409 Either remove from docs or add a docstring?
``` julia> a = ROCArray(rand(10,10)); julia> KernelAbstractions.get_device(a) ERROR: StackOverflowError: Stacktrace: [1] get_device(A::ROCMatrix{Float64}) (repeats 79984 times) @ KernelAbstractions ~/.julia/packages/KernelAbstractions/DqITC/src/KernelAbstractions.jl:355 ``` Calling `AMDGPU.device(a)` works: ``` julia> AMDGPU.device(a) GPU-XX [AMD Radeon RX 6700...
Thank you for this beautiful library ! In contrast with another recent issue I'm finding rather large speedup for the CPU kernel that I have implemented. The doc does not...
Would have caught https://github.com/JuliaGPU/Metal.jl/issues/336.
``` # This works flawlessly @kernel function f() a = @index(Global, Cartesian) @print(a[1]) ``` ``` # This doesn't compile in CPU @kernel function f() a = let @index(Global, Cartesian) end...
Running the CUDA benchmarks from the [HPCBenchmarks.jl](https://github.com/PTsolvers/HPCBenchmarks.jl/tree/main/CUDA) tests returns significant performance drop using KA with dynamic range definition. The blow tests are performed on GH200 using local CUDA 12.4 install...