MAT.jl icon indicating copy to clipboard operation
MAT.jl copied to clipboard

Memory leak when reading mat files v7

Open ymtoo opened this issue 2 years ago • 5 comments

MWE:

for _ ∈  1:1000000
    matread("test/v7/array.mat")
end

Memory usage is growing until OOM. It works fine when reading mat files v6 and v7.3.

Julia and package version:

julia> versioninfo()
Julia Version 1.8.4
Commit 00177ebc4fc (2022-12-23 21:32 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

(jl_Qk9Uql) pkg> st
Status `/tmp/jl_Qk9Uql/Project.toml`
  [23992714] MAT v0.10.3

ymtoo avatar Mar 03 '23 08:03 ymtoo

I am also having excessive memory leak issues with v0.10.7 on Julia 1.11.2 when reading mat files.

ilkerduymaz avatar Mar 02 '25 13:03 ilkerduymaz

I suspect this is related to the memory leak issue in the HDF5 library JuliaIO/HDF5.jl#1186

ilkerduymaz avatar Jun 03 '25 10:06 ilkerduymaz

Not sure what's going on in the v6/v7 files. Note that they v5/v6/v7 are all using a similar binary file format, while v7.3 uses HDF5.

It's interesting that v7/array.mat uses more total memory in general:

julia> @time matread("test/v6/array.mat");
  0.000587 seconds (130 allocations: 5.359 KiB)

julia> @time matread("test/v7/array.mat");
  0.000928 seconds (298 allocations: 245.035 KiB)

Going into the code, it seems in v6 I encounter MAT_v5.miMATRIX types, while in v7 I encounter MAT_v5.miCOMPRESSED types in MAT_v5.read_matrix, which means v7 calls ZlibDecompressorStream(IOBuffer(read!(f, Vector{UInt8}(undef, nbytes))))

Perhaps there's an issue there. Either the ZlibDecompressorStream or the IOBuffer itself?

Hmm. I recall BufferedStreams.jl can help with gzip decompression performance. I also recall a new package being announced for improved buffering... If I have some more time I'll look deeper into this.

matthijscox avatar Nov 20 '25 15:11 matthijscox

I tried on Julia 1.12.1 and it doesn't OOM for me, neither in v6 or v7, memory usage is stable. Could be this is improved with the newer memory layout types, which are also used in the IOBuffer.

Wrapping the IOStream with BufferedInputStream did not improve performance, so ignore that idea.

I'm a little concerned that I seem to get segfaults when I try to kill the loop, but it seems to be related to printing after the interrupt. When I add a ; behind the matread line, it doesn't segfault.

Version v7.3 definitely keep growing my memory. So that HDF5 memory leak issue seems real.

matthijscox avatar Nov 21 '25 13:11 matthijscox

Calling the HDF5 garbage collector doesn't help for me. So not exactly sure it's the HDF5 issue. This code still increases memory usage slowly over time:

using MAT, HDF5
for n ∈ 1:1000000
    if mod(n,1000)==0
         println(n)
         HDF5.API.h5_garbage_collect()
    end
    matread("test/v7.3/array.mat")
end

Calling GC.gc() after 100-200k runs reduces memory usage a little, but only ~10% or so.

I wonder if profiling memory usage could shed some light on the matter. Though not sure if that could catch any open file IO.

matthijscox avatar Nov 25 '25 08:11 matthijscox