KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Heterogeneous programming in Julia
A few backends have the option to create arrays using unified memory. Using unified memory for certain algorithms can have significant (positive) impact on performance by removing the need for...
Reproducer: CUDA.jl: ```julia using CUDA using Adapt CUDA.allowscalar(false) # using KernelAbstractions is_valid_index(meta, ui) = 1 ≤ ui[1] ≤ params(meta)[4] && 1 ≤ ui[2] ≤ params(meta)[1] && 1 ≤ ui[3] ≤...
At the moment, `supports_atomics` returns a boolean, but different backends have different levels of support. For example, Metal essentially only supports 32-bt integers and floats, with 64-bit integer atomics being...
Implement reduction API. Supports two types of algorithms: - thread: reduction performed by threads: uses shmem of length `groupsize`, no bank conflict, no divergence. - warp: reduction performed by `shlf_down`...
Kernel closures support passing `ndrange` and `workgroupsize` as keyword arguments: https://github.com/JuliaGPU/KernelAbstractions.jl/blob/97419620494baa2e45541a6f2015413d6fa9315b/src/KernelAbstractions.jl#L661 The kernel function itself probably should too, while it currently only accepts positional versions of these arguments (used to...
I'm not sure how to explain this behavior, it's like `@localmem` cancels the effects of the kernel: ```julia julia> using Metal, KernelAbstractions julia> backend = Metal.MetalBackend() MetalBackend() julia> @kernel function...
I had a request to do #429 for fastmath, so here is my attempt. #431 is also related 2 issues: 1. My only issue is that I don't know what...
To address the 0% code coverage, you main need to follow the steps outlined in https://discourse.julialang.org/t/psa-new-version-of-codecov-action-requires-additional-setup/109857 to use a Codecov token for report upload https://github.com/JuliaGPU/KernelAbstractions.jl/actions/runs/13809668007/job/38632985795#step:13:34
This is mainly to start a conversation around the KA kernel language, as it currently starts accumulating more functionality / cruft; for example, if I want a high-performance kernel as...
While looking at #583 I noticed that the `aug_fwd` kernel looks like: ``` function aug_fwd( ctx, f::FT, ::Val{ModifiedBetween}, subtape, ::Val{TapeType}, args..., ) where {ModifiedBetween, FT, TapeType} # A2 = Const{Nothing}...