Valentin Churavy
Valentin Churavy
MVectors are very small and for most scenarios I would expect the overhead of KA not to be worth it, and I would rather receive an error instead of making...
Part of `v0.9.36`
Off that's nasty. https://github.com/JuliaGPU/KernelAbstractions.jl/blob/1ac546fc59cc611d749fa7a50e4a1efa3393851b/src/KernelAbstractions.jl#L622 So CUDAKernels also should have used `Int` (Can you open a PR?)
Reactant should be unnecessary since allowscalar is from GPUArraysCore? ``` using LinearAlgebra using Enzyme, GPUArraysCore β = 100 μ = 0.3 fermi_fn(x; β, μ) = 1 / (1 + exp(β*(x-μ)))...
Since `allowscalar` uses `task_local_storage` https://github.com/JuliaGPU/GPUArrays.jl/blob/3c16583dce63972a267b021cac841fbd19ec26df/lib/GPUArraysCore/src/GPUArraysCore.jl#L178-L210 and the error is for: `o: %511 = call nonnull "enzyme_type"="{[-1]:Pointer}" {} addrspace(10)* @ijl_eqtable_get({} addrspace(10)* nonnull %510, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140071093935392 to...
For the CPU the majority of time spent is coming from lacking precompilation statements. `--trace-compile` result https://gist.github.com/vchuravy/d2e1a70dbdb4bf76b454867c3b6fd0c2
Huh that's a fun one...
Master will need some more work to stabilize, it's a breaking release
#600 is currently tracking some of the work needed, but otherwise when it's done and tested enough
Note that this is due to the use of `KA.@index` instead of `@index` ``` julia> KA.@kernel function mul2_kernel(A) I = @index(Global, Linear) end ``` works.