KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Heterogeneous programming in Julia

Results 165 KernelAbstractions.jl issues
Sort by recently updated
recently updated
newest added

Ok, maybe I'm just tired, but... ``` using AMDGPU using ROCKernels using KernelAbstractions @kernel function f_test_kernel!() @print(1, '\n') end function f_test!(AT; numcores = 4, numthreads = 256) if AT ==...

Hi, thank you for developing this library - I would like to write optimised kernels for common GPU algorithms such as reduce, scan, radix sort, etc. similar to [CUB](https://nvlabs.github.io/cub/) but...

The internal APO in the docs includes `KernelAbstractions.partition` https://github.com/JuliaGPU/KernelAbstractions.jl/blob/2c67ba27578d2f3c4e38722fe103043e7ae5f442/docs/src/api.md#L23 but there is no docstring https://github.com/JuliaGPU/KernelAbstractions.jl/blob/2c67ba27578d2f3c4e38722fe103043e7ae5f442/src/KernelAbstractions.jl#L409 Either remove from docs or add a docstring?

documentation

``` julia> a = ROCArray(rand(10,10)); julia> KernelAbstractions.get_device(a) ERROR: StackOverflowError: Stacktrace: [1] get_device(A::ROCMatrix{Float64}) (repeats 79984 times) @ KernelAbstractions ~/.julia/packages/KernelAbstractions/DqITC/src/KernelAbstractions.jl:355 ``` Calling `AMDGPU.device(a)` works: ``` julia> AMDGPU.device(a) GPU-XX [AMD Radeon RX 6700...

bug
AMD

Thank you for this beautiful library ! In contrast with another recent issue I'm finding rather large speedup for the CPU kernel that I have implemented. The doc does not...

Would have caught https://github.com/JuliaGPU/Metal.jl/issues/336.

``` # This works flawlessly @kernel function f() a = @index(Global, Cartesian) @print(a[1]) ``` ``` # This doesn't compile in CPU @kernel function f() a = let @index(Global, Cartesian) end...

Running the CUDA benchmarks from the [HPCBenchmarks.jl](https://github.com/PTsolvers/HPCBenchmarks.jl/tree/main/CUDA) tests returns significant performance drop using KA with dynamic range definition. The blow tests are performed on GH200 using local CUDA 12.4 install...