Anton Smirnov

Results 213 comments of Anton Smirnov

> @pxl-th mentioned the [CU_MEMHOSTALLOC_PORTABLE](https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__TYPES_g50f4528d46bda58b592551654a7ee0ff.html) CUDA flag. Can we use that in AMDGPU? You can, but at the moment it is not pretty: ```julia bytesize = prod(dims) * sizeof(T) buf...

Yes, you should use AMDGPU 1.0, it has important multi-GPU fixes. Here's the code, I don't have access to multi-gpu system at the moment, but at least on 1 GPU...

Ah... That's a bug in AMDGPU.jl with setting features of the compilation target. I'll fix it

```julia using AMDGPU function gpu_kernel_xx!(x, Nx::Int64, Ny::Int64) i = AMDGPU.threadIdx().x i ≤ 10 || return workitems = CartesianIndices((10, 1)) @inbounds workitems[i] # accessing `workitems` triggers miscompilation y = 0f0 for...

@vchuravy so this is a bug with ROCm LLVM?

If you have some infrastructure setup for this... Going through every commit seems like it'd take a lot of time

I observe a similar behavior with LLVM 18 & Julia 1.12. Julia MWE (reduced from generic matrix multiplication kernel): ```julia using LLVM.Interop using AMDGPU function matmatmul_kernel_bad!(C::AbstractArray{T}, A) where T assume(length(C)...

Closing this as we now have caching allocator which avoids GC and allows for fast reuse of allocations: https://juliagpu.github.io/GPUArrays.jl/dev/interface/#Caching-Allocator It can also be used with `@btime` to avoid blowing up...

Ah, the problem is when one of the padding values is 0. I'll push the fix later today, thanks for posting this!

Sorry for the delay, should be fixed by https://github.com/FluxML/NNlib.jl/pull/595