Valentin Churavy comments

Results 1415 comments of


                                            Valentin Churavy

CPU `__thread_run` could loop over CartesianIndices?

Might be interesting, I haven't looked into the execution there too closely. Do you have a benchmark?

CPU `__thread_run` could loop over CartesianIndices?

If you can contribute it here: https://github.com/JuliaGPU/KernelAbstractions.jl/tree/main/benchmark that would be nice!

CPU `__thread_run` could loop over CartesianIndices?

Yeah, for the CPU I often use a workgroupsize of 1024

MAX support?

Can you share a MWE?

Allow `return` or `return nothing` in kernels

> At the same time, ending a kernel with an expression (which is equivalent to a return statement) is allowed. KernelAbstractions will politly ignore those and insert a `return nothing`...

groupreduction and subgroupreduction

Thank you! As you found the notion of a subgroup is not necessarily consistent across backends. So I am wondering if we need to expose that as feature or if...

groupreduction and subgroupreduction

Yeah requiring a neutral value sounds good. Could not each backend choose to use warp level ops, iff available? Instead of having the user make this decision.

groupreduction and subgroupreduction

> would it be prefered that every thread returns the right rval? I do think in that case I Will have to rethink the warp reduction bases groupreduce. No it...

groupreduction and subgroupreduction

I recommed `using Cthulhu, using CUDA` and then using `CUDA.device_code_typed interactive=true`

groupreduction and subgroupreduction

Okay so this is API neutral? Pure addition and we don't need anything new? Great! Otherwise we could have wrapped that in with https://github.com/JuliaGPU/KernelAbstractions.jl/pull/422