Valentin Churavy
Valentin Churavy
What backend are you executing this one? Can you isolate this into a MWE.
Ah shoot. I had expected it to be related to https://github.com/JuliaGPU/CUDA.jl/pull/2336 But this looks much more like a core Julia bug.
I am going to try and `creduce` this.
So it does belong here since on a machine without CUDA it is not reproducible.
@kh-abd-kh I am sorry for your frustration. https://discourse.julialang.org is a better place to seek help than this issue tracker. For this particular case I am confused why it even downloaded...
> Could these things maybe live in https://github.com/anicusan/AcceleratedKernels.jl in the future ? There is a dependency ordering issue, GPUArrays is the common infrastructure and this is would be the fallback...
Of course JLArrays doesn't work.. That uses the CPU backend and this is `cpu=false`
Does #436 speed things up for you? I haven't merged it since I haven't seen an impact on benchmarks.
So the question would be where are the overheards coming from, so maybe you cana run a profile and compare the time being spend. You could also use static scheduling....
@avik-pal Isn't this just a normal Julia performance pitfal? ``` @kernel function batchnorm_kernel_act!(y, @Const(act), @Const(scale), @Const(bias), @Const(x), @Const(μ), @Const(σ²)) i, j = @index(Global, NTuple) y[i, j] = act(scale[i] * (x[i,...