Valentin Churavy
Valentin Churavy
> So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them. Correct!...
Yeah :/ Did you ever check why Revise breaks? Jameson mentioned a few weeks ago that OverlayMethod tables don't participate in invalidations correctly. (cc: @aviatesk)
I just needed an example and I have some old data from last year. I am purely interested in the time it takes until the first time step completes. The...
Thank you @simone-silvestri This enables: https://github.com/JuliaGPU/GPUCompiler.jl/pull/557#issuecomment-2183674470 which is a great win for running massively parallel simulations.
The next one I ran into is https://github.com/CliMA/Oceananigans.jl/blob/00f028bb37f13692e24921588aeb8a9150f6dd55/src/Advection/reconstruction_coefficients.jl#L215 So one alternative is to use named tuples instead of variables. ``` @inline function compute_reconstruction_coefficients(grid, FT, scheme; order) method = scheme ==...
I think we need to look at all eval usage I hit: ``` function minimum_spacing(dir, grid, ℓx, ℓy, ℓz) spacing = eval(Symbol(dir, :spacing)) LX, LY, LZ = map(destantiate, (ℓx, ℓy,...
Yeah this is due to KA allowing for arbitrary dimensions instead of just limiting the user to `3`. You end up in https://github.com/JuliaGPU/CUDA.jl/blob/7f725c0a117c2ba947015f48833630605501fb3a/src/CUDAKernels.jl#L178 and thereafter in https://github.com/JuliaGPU/KernelAbstractions.jl/blob/c5fe83c899b3fd29308564467c3a3722179bfe9d/src/nditeration.jl#L73 So if we...
Can you use `CUDA.@device_code dir="out"` for both cases kernels? In particular the optimized `.ll` would be of interest.
There is a performance pitfall that I didn't expect... https://github.com/JuliaGPU/KernelAbstractions.jl/blob/c5fe83c899b3fd29308564467c3a3722179bfe9d/src/nditeration.jl#L83 ``` ; │┌ @ /srv/scratch/lraess/julia_depot/packages/KernelAbstractions/zPAn3/src/nditeration.jl:84 within `expand` ; ││┌ @ abstractarray.jl:1291 within `getindex` ; │││┌ @ abstractarray.jl:1336 within `_getindex` ;...
x-ref: https://github.com/JuliaGPU/GPUArrays.jl/pull/520