KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Heterogeneous programming in Julia
This is my first pass at adding `printf` support. I'm still experimenting, but I may need help with this!
Hey there, I'm really enjoying using KernelAbstractions so far! Doing Multievents feels clunky, from looking at the source I've used it as follows: ```julia events = [] for layer in...
The link to `@uniform` in the docs is [broken](https://juliagpu.gitlab.io/KernelAbstractions.jl/api/@ref). Maybe we can/should set `strict = true` in the docs?
We're trying to combine KernelAbstractions with StructArrays, but get problems when nested `StructArray`s are used. A reproducible example: ```julia using CUDA, KernelAbstractions, StructArrays @kernel function copy_kernel!(A, @Const(B)) I = @index(Global)...
See the title. Works in a `CUDA` kernel and in `KA` on Julia 1.4.1. MWE ``` using CUDA using KernelAbstractions function cuda_kernel(A) @inbounds A[1] = eps(A[1]) nothing end @kernel function...
Not really a bug in KA, but @vchuravy asked me to post When `uniform` memory is used with a synchronize in an if statement one needs to be careful due...
It would be handy if there was a mode where dynamic dispatch on a CPU would throw an error.
```julia using KernelAbstractions @kernel function f_mwe(dt) 0.0:dt:1.0 end version = CUDADevice() wait(version, f_mwe(version)(0.1;ndrange=10)) ``` ``` InvalidIRError: compiling kernel gpu_f_mwe(Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize,KernelAbstractions.NDIteration.DynamicCheck,Nothing,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},KernelAbstractions.NDIteration.NDRange{1,KernelAbstractions.NDIteration.DynamicSize,KernelAbstractions.NDIteration.DynamicSize,CartesianIndices{1,Tuple{Base.OneTo{Int64}}},CartesianIndices{1,Tuple{Base.OneTo{Int64}}}}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks}, typeof(gpu_f_mwe), Float64) resulted in invalid LLVM IR Reason: unsupported call through...
I've heard that there's performance degradation using kwargs in kernels, so here's my attempt at a MWE: (`test/kwarg_performance.jl`) ```julia using KernelAbstractions using CUDA using Test foo_kwarg(;a=1) = a+2 foo_parg(a=1) =...
This code fails when using `CUDADevice()`. When using `CPU()` however, everything works just fine. Michael Abbott identified this to be a problem with the broacasting, replacing these with `map` resolves...