Valentin Churavy comments

Results 1413 comments of


                                            Valentin Churavy

Kernel only gives proper result when adding `@print()`

What backend are you executing this one? Can you isolate this into a MWE.

`illegal hardware instruction` via precompiling CPU and CUDA KA kernels

Ah shoot. I had expected it to be related to https://github.com/JuliaGPU/CUDA.jl/pull/2336 But this looks much more like a core Julia bug.

`illegal hardware instruction` via precompiling CPU and CUDA KA kernels

I am going to try and `creduce` this.

`illegal hardware instruction` via precompiling CPU and CUDA KA kernels

So it does belong here since on a machine without CUDA it is not reproducible.

cuda artifacts are downloaded repeatedly per Julia session

@kh-abd-kh I am sorry for your frustration. https://discourse.julialang.org is a better place to seek help than this issue tracker. For this particular case I am confused why it even downloaded...

Implement mapreduce

> Could these things maybe live in https://github.com/anicusan/AcceleratedKernels.jl in the future ? There is a dependency ordering issue, GPUArrays is the common infrastructure and this is would be the fallback...

Implement mapreduce

Of course JLArrays doesn't work.. That uses the CPU backend and this is `cpu=false`

How can we make KA fast on CPUs?

Does #436 speed things up for you? I haven't merged it since I haven't seen an impact on benchmarks.

How can we make KA fast on CPUs?

So the question would be where are the overheards coming from, so maybe you cana run a profile and compare the time being spend. You could also use static scheduling....

How can we make KA fast on CPUs?

@avik-pal Isn't this just a normal Julia performance pitfal? ``` @kernel function batchnorm_kernel_act!(y, @Const(act), @Const(scale), @Const(bias), @Const(x), @Const(μ), @Const(σ²)) i, j = @index(Global, NTuple) y[i, j] = act(scale[i] * (x[i,...