Valentin Churavy

Results 1415 comments of Valentin Churavy

Might be interesting, I haven't looked into the execution there too closely. Do you have a benchmark?

If you can contribute it here: https://github.com/JuliaGPU/KernelAbstractions.jl/tree/main/benchmark that would be nice!

Yeah, for the CPU I often use a workgroupsize of 1024

Can you share a MWE?

> At the same time, ending a kernel with an expression (which is equivalent to a return statement) is allowed. KernelAbstractions will politly ignore those and insert a `return nothing`...

Thank you! As you found the notion of a subgroup is not necessarily consistent across backends. So I am wondering if we need to expose that as feature or if...

Yeah requiring a neutral value sounds good. Could not each backend choose to use warp level ops, iff available? Instead of having the user make this decision.

> would it be prefered that every thread returns the right rval? I do think in that case I Will have to rethink the warp reduction bases groupreduce. No it...

I recommed `using Cthulhu, using CUDA` and then using `CUDA.device_code_typed interactive=true`

Okay so this is API neutral? Pure addition and we don't need anything new? Great! Otherwise we could have wrapped that in with https://github.com/JuliaGPU/KernelAbstractions.jl/pull/422