Anton Smirnov
Anton Smirnov
Actually, test for CUDA.jl also gives this error: ```julia function mul_kernel(A) i = threadIdx().x if i
So I got confused, but with CUDA.jl if you wrap in ```julia function mul_kernel(A) i = threadIdx().x A[i] *= A[i] return nothing end function grad(A, dA) autodiff_deferred(Reverse, mul_kernel, Duplicated(A, dA))...
@wsmoses, sorry for spamming, but are there any examples with KA not involving host code (just the kernel)?
> autodiff call is inside the device code entirely Oh, I see! Now it works! A note somewhere in the docs might be useful (unless I missed one). Thanks for...
It works for the `mul_kernel`, however fails when using with more complex kernels. For example, with `sin` function. Error: ```julia ERROR: InvalidIRError: compiling MethodInstance for gpu_gker(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}},...
> Yeah that's the same as #683 Just curious if the fix is coming relatively soon or is it more involved?
Now `amgpu_simple.jl` fails with: ``` ERROR: LoadError: Scalar indexing is disallowed. Invocation of getindex resulted in scalar indexing of a GPU array. This is typically caused by calling an iterating...
Same issue is with CUDA though
I'll close it as specifying syncscope fixes the issue. Thanks!
FYI, I have disabled tests for rocSPARSE temporarily since they were crashing my Navi 3 in CI and I didn't have the time to investigate the final cause. Also for...