Anton Smirnov comments

Results 213 comments of


                                            Anton Smirnov

[Julia] hipMallocAsync resulted in NULL pointer

In scenarios where there is a lot of pressure on the allocator, like in [Nerf.jl](https://github.com/JuliaNeuralGraphics/Nerf.jl), this leads to a consistent `NULL` pointer returned from `hipMallocAsync` and the return status is...

[Julia] hipMallocAsync resulted in NULL pointer

I think this is resolved. Thanks!

GPUCompiler.CodeCache not defined

I plan to work on 1.11 support soon. I'm finishing reworking a bit how memory buffers are handled and after that I'll focus on 1.11

Device-to-host copies with GPU code

The problem is not that there is `sum(...)` at the end. You can replace it with proposed `sum0d` or with `sum(...; dims=1:ndims(x))` to return 0D array. Then switch from `Zygote.gradient`...

Device-to-host copies with GPU code

@jpsamaroo has also suggested `GPUScalar` approach and I think ultimately it should be either this one or returning `GPUArray` that will fix these performance issues. We just need to avoid...

Device-to-host copies with GPU code

Got `GPUScalar` approach working locally. Seems to be quite a minimal change and non-breaking (unless your code relies on the return type to be e.g. `Float32` instead of `GPUScalar{Float32}`). All...

cuda gpu memory usage increasing in time

The issue is likely with GC not being GPU-aware and not finalizing gpu arrays in time, so the memory just keeps growing, even though only a fraction of it is...

cuda gpu memory usage increasing in time

Also while setting ``` ENV["JULIA_CUDA_HARD_MEMORY_LIMIT"] = "10%" ENV["JULIA_CUDA_SOFT_MEMORY_LIMIT"] = "5%" ``` does cap the maximum memory usage, it does not really improve the performance, since when you hit a limit...

cuda gpu memory usage increasing in time

I've experimented with yet another approach (https://github.com/JuliaGPU/AMDGPU.jl/pull/708) that further significantly improves performance. Instead of recording memory allocations and then bulk-freeing them, I've implemented caching memory allocator that keeps memory allocations...

Introduce `AsyncNumber` to lazily copy numeric `mapreduce` results to the host

@maleadt, the remaining CUDA.jl issues are because of method signatures, which can easily be updated to handle the change. I can open respective PR in CUDA.jl if you think this...