KernelAbstractions.jl issues

Unified memory interface?

2

A few backends have the option to create arrays using unified memory. Using unified memory for certain algorithms can have significant (positive) impact on performance by removing the need for...

christiangnrd

enhancement

good first issue

Performance discrepancy against CUDA

Reproducer: CUDA.jl: ```julia using CUDA using Adapt CUDA.allowscalar(false) # using KernelAbstractions is_valid_index(meta, ui) = 1 ≤ ui[1] ≤ params(meta)[4] && 1 ≤ ui[2] ≤ params(meta)[1] && 1 ≤ ui[3] ≤...

charleskawczynski

Usefulness of `supports_atomics` in its current form

7

At the moment, `supports_atomics` returns a boolean, but different backends have different levels of support. For example, Metal essentially only supports 32-bt integers and floats, with 64-bit integer atomics being...

christiangnrd

Implement groupreduce API

5

Implement reduction API. Supports two types of algorithms: - thread: reduction performed by threads: uses shmem of length `groupsize`, no bank conflict, no divergence. - warp: reduction performed by `shlf_down`...

pxl-th

Accept static sizes as keyword arguments

2

Kernel closures support passing `ndrange` and `workgroupsize` as keyword arguments: https://github.com/JuliaGPU/KernelAbstractions.jl/blob/97419620494baa2e45541a6f2015413d6fa9315b/src/KernelAbstractions.jl#L661 The kernel function itself probably should too, while it currently only accepts positional versions of these arguments (used to...

maleadt

enhancement

good first issue

`@localmem` silently breaks things when used with non-const dims

2

I'm not sure how to explain this behavior, it's like `@localmem` cancels the effects of the kernel: ```julia julia> using Metal, KernelAbstractions julia> backend = Metal.MetalBackend() MetalBackend() julia> @kernel function...

gdalle

attempt to force fastmath at the kernel level

10

I had a request to do #429 for fastmath, so here is my attempt. #431 is also related 2 issues: 1. My only issue is that I don't know what...

leios

Codecov failure

To address the 0% code coverage, you main need to follow the steps outlined in https://discourse.julialang.org/t/psa-new-version-of-codecov-action-requires-additional-setup/109857 to use a Codecov token for report upload https://github.com/JuliaGPU/KernelAbstractions.jl/actions/runs/13809668007/job/38632985795#step:13:34

gdalle

Lower-level kernel form?

6

This is mainly to start a conversation around the KA kernel language, as it currently starts accumulating more functionality / cruft; for example, if I want a high-performance kernel as...

anicusan

Enzyme integration likely causes divergent kernels

While looking at #583 I noticed that the `aug_fwd` kernel looks like: ``` function aug_fwd( ctx, f::FT, ::Val{ModifiedBetween}, subtape, ::Val{TapeType}, args..., ) where {ModifiedBetween, FT, TapeType} # A2 = Const{Nothing}...

vchuravy

Enzyme

KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard

Metadata

Unified memory interface?

Performance discrepancy against CUDA

Usefulness of `supports_atomics` in its current form

Implement groupreduce API

Accept static sizes as keyword arguments

`@localmem` silently breaks things when used with non-const dims

attempt to force fastmath at the kernel level

Codecov failure

Lower-level kernel form?

Enzyme integration likely causes divergent kernels

← Metadata

Owner

Metadata

KernelAbstractions.jl KernelAbstractions.jl copied to clipboard

Metadata

← Metadata

Owner

Metadata

KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard