KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Heterogeneous programming in Julia
Hi, I try to use KA for the first time and I wonder about the performance I obtain for a simple kernel copying 2 2D matrices of Float32 (I know...
I use `JuliaFormatter` to format my source code. This adds explicit `return` statements. This leads to ``` ERROR: LoadError: Return statement not permitted in a kernel function sum2_kernel! ``` even...
Apparently, DiffEqGPU is failing on v1.10 on Metal: https://buildkite.com/julialang/diffeqgpu-dot-jl/builds/1006#018cf9e1-e6db-42da-b270-1afbf733a6d4
Hi! Thank you very much for this project. I'm working on a kernel where I need to do a "max". For example `a = max(1,2) ` But I'm getting this...
It would be great in a multi-device system, the device id that will run a KA kernel could be set through a function call. cc: @vchuravy
I noticed in Stencils.jl that when I'm using a fast stencil (e.g. 3x3 window summing over a `Matrix{Bool}`) that the indexing in `__thread_run` takes longer than actually reading and summing...
Together with @weymouth we are trying to create a kernel that loops over an n-dimensional array and applies a function to each element. While we can certainly achieve to do...
I am unsure why my previous PR closed but here are the changes. - I added docs - I added tests It was my first time writing tests, and they...
On CPU always use `NoDynamicCheck()`, just finish the last partial workgroup with `DynamicCheck()`
Given that `DynamicCheck()` breaks SIMD this can be an order of magnitude faster for some inexpensive tasks. I'll write up a better MWE, but this is the scale of it...
I had a request from a user to use warp-level semantics from CUDA: `sync_warp`, `warpsize`, and stuff here: https://cuda.juliagpu.org/stable/api/kernel/#Warp-level-functions. They seem to be available here: https://rocm.docs.amd.com/projects/rocPRIM/en/latest/warp_ops/index.html, but I don't know...