HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

Automatic h5_garbage_collect() garbage collection

Open denglerchr opened this issue 10 months ago • 6 comments

Good afternoon,

There might be a memory leak in HDF5, related to using driver=Drivers.Core(; backing_store=false). I created a reduced exampled that can be reproduced as follows:

  1. generate a docker file including HDF5
# build with -> docker build -t hdf5test:1.0 .
FROM julia:1.11.2

RUN julia -e "import Pkg; Pkg.add([\"HDF5\", \"H5Zblosc\"])"
ENTRYPOINT ["julia"]
  1. run the following code in the docker container (e.g., run sudo docker run -it --memory=500m hdf5test:1.0 and copy the code ), it will be killed for OOM reason sooner or later
using HDF5

function main()
    while true
        h5open("abc.h5", "w"; driver=Drivers.Core(; backing_store=false)) do fid
            fid["M"] = randn(1000, 1000)
            return Vector{UInt8}(fid)
        end
        # GC.gc() # enabling or diabling doesnt change much
    end
    return nothing
end

main()

The container memory will immediately jump close to the limit and stay there for a while, for higher memory cap, it will take longer for the container to be killed. Once the container is killed, to be sure it was due to memory, you can docker inspect <containerid>

Best regards, Christian Dengler

denglerchr avatar Jan 20 '25 13:01 denglerchr

Could you see if invokingHDF5.API.h5_garbage_collect() helps?

https://github.com/JuliaIO/HDF5.jl/blob/master/src%2Fapi%2Ffunctions.jl#L67

mkitti avatar Jan 20 '25 14:01 mkitti

I did a quick test, including this in the loop seems to stabilize the memory usage. I guess this is not a bug then? Or should this be called automatically somehow?

denglerchr avatar Jan 20 '25 15:01 denglerchr

I would consider this to be a workaround for now.

I need to investigate further how well this is documented upstream in HDF5 itself, and when would be appropriate to call this automatically.

Perhaps a HDF5.gc() would br warranted if this is needed to be called by the a user.

mkitti avatar Jan 20 '25 15:01 mkitti

Ok, Ill keep this ticket open in that case

denglerchr avatar Jan 20 '25 15:01 denglerchr

Ideally we should call this when the Julia GC is invoked, but we probably don't want to call it every time an object is freed.

One way to do this would be to add a callback into the Julia GC (so it gets called after the Julia GC is invoked). This can be done by calling jl_gc_set_cb_post_gc with a function pointer. The downside is that we can't call actual Julia code, so we would have to write a C shim around it. This is what I did for NVTX.jl: https://github.com/JuliaGPU/NVTX.jl/blob/main/src/julia.jl

simonbyrne avatar Jan 21 '25 20:01 simonbyrne

In this case with the do syntax, I think we could call thr HDF5 GC when closing the "file" when we know that file is backed by allocated memory.

mkitti avatar Jan 22 '25 12:01 mkitti

Can I help somehow with this issue? I'd like MAT.jl to be in a good state.

matthijscox avatar Nov 21 '25 13:11 matthijscox

Can I help somehow with this issue? I'd like MAT.jl to be in a good state.

Could you see if invoking HDF5.API.h5_garbage_collect() helps?

I'm not sure if the issues are linked. The next step would be to create the high level function.

mkitti avatar Nov 21 '25 13:11 mkitti