Anton Smirnov
Anton Smirnov
In scenarios where there is a lot of pressure on the allocator, like in [Nerf.jl](https://github.com/JuliaNeuralGraphics/Nerf.jl), this leads to a consistent `NULL` pointer returned from `hipMallocAsync` and the return status is...
I think this is resolved. Thanks!
I plan to work on 1.11 support soon. I'm finishing reworking a bit how memory buffers are handled and after that I'll focus on 1.11
The problem is not that there is `sum(...)` at the end. You can replace it with proposed `sum0d` or with `sum(...; dims=1:ndims(x))` to return 0D array. Then switch from `Zygote.gradient`...
@jpsamaroo has also suggested `GPUScalar` approach and I think ultimately it should be either this one or returning `GPUArray` that will fix these performance issues. We just need to avoid...
Got `GPUScalar` approach working locally. Seems to be quite a minimal change and non-breaking (unless your code relies on the return type to be e.g. `Float32` instead of `GPUScalar{Float32}`). All...
The issue is likely with GC not being GPU-aware and not finalizing gpu arrays in time, so the memory just keeps growing, even though only a fraction of it is...
Also while setting ``` ENV["JULIA_CUDA_HARD_MEMORY_LIMIT"] = "10%" ENV["JULIA_CUDA_SOFT_MEMORY_LIMIT"] = "5%" ``` does cap the maximum memory usage, it does not really improve the performance, since when you hit a limit...
I've experimented with yet another approach (https://github.com/JuliaGPU/AMDGPU.jl/pull/708) that further significantly improves performance. Instead of recording memory allocations and then bulk-freeing them, I've implemented caching memory allocator that keeps memory allocations...
@maleadt, the remaining CUDA.jl issues are because of method signatures, which can easily be updated to handle the change. I can open respective PR in CUDA.jl if you think this...