Anton Smirnov comments

Results 213 comments of


                                            Anton Smirnov

Using one single array of pointers for multiGPU AMDGPU computation

> @pxl-th mentioned the [CU_MEMHOSTALLOC_PORTABLE](https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__TYPES_g50f4528d46bda58b592551654a7ee0ff.html) CUDA flag. Can we use that in AMDGPU? You can, but at the moment it is not pretty: ```julia bytesize = prod(dims) * sizeof(T) buf...

Using one single array of pointers for multiGPU AMDGPU computation

Yes, you should use AMDGPU 1.0, it has important multi-GPU fixes. Here's the code, I don't have access to multi-gpu system at the moment, but at least on 1 GPU...

Using one single array of pointers for multiGPU AMDGPU computation

Ah... That's a bug in AMDGPU.jl with setting features of the compilation target. I'll fix it

Miscompilation with nested loops and control flow

```julia using AMDGPU function gpu_kernel_xx!(x, Nx::Int64, Ny::Int64) i = AMDGPU.threadIdx().x i ≤ 10 || return workitems = CartesianIndices((10, 1)) @inbounds workitems[i] # accessing `workitems` triggers miscompilation y = 0f0 for...

Miscompilation with nested loops and control flow

@vchuravy so this is a bug with ROCm LLVM?

Miscompilation with nested loops and control flow

If you have some infrastructure setup for this... Going through every commit seems like it'd take a lot of time

Miscompilation with nested loops and control flow

I observe a similar behavior with LLVM 18 & Julia 1.12. Julia MWE (reduced from generic matrix multiplication kernel): ```julia using LLVM.Interop using AMDGPU function matmatmul_kernel_bad!(C::AbstractArray{T}, A) where T assume(length(C)...

GC less effective in AMDGPU than CUDA

Closing this as we now have caching allocator which avoids GC and allows for fast reuse of allocations: https://juliagpu.github.io/GPUArrays.jl/dev/interface/#Caching-Allocator It can also be used with `@btime` to avoid blowing up...

`pad_reflect` broken

Ah, the problem is when one of the padding values is 0. I'll push the fix later today, thanks for posting this!

`pad_reflect` broken

Sorry for the delay, should be fixed by https://github.com/FluxML/NNlib.jl/pull/595