netcdf-c
netcdf-c copied to clipboard
Huge memory consumption in chunk-cache after handling several opened netcdf-files with default-chunk-cache configuration
When using the default chunk-cache settings and opening several simultaneously, the netcdf-library create a permanent memory-consumption of up to 64MB per file and variable even after closing all files (approx. 1.6GB of chunk-cache for 5 files with 5 variables).
The problem can be circumvented by using any chunk-cache size different from the default. It might be related to https://docs.unidata.ucar.edu/netcdf-c/current/nc4hdf_8c_source.html line 1155 / nc4_adjust_var_cache.
Environment:
netcdf-4.9.2
and netcdf-4.8.1
, tested on linux (e.g. ubuntu 22.04 default netcdf-4.8.1 or rh-el8 with latest netcdf from conda 4.9.2)
The problem occurs both with python-netCDF4 and in an in-house C++ application: https://github.com/metno/fimex
Test-program (in python) is attached. It creates first 5 larger files (5 variables of size 64MB=320MB) in the local directory and then reads them with modified and default chunk-cache size. test_netcdf_memusage.zip With the basic reading function:
def netcdf_test(paths: list):
nc_list = []
for f in paths:
nc = netCDF4.Dataset(f, 'r')
nc_list.append(nc)
for t in range(nc[f"var0"].shape[0]):
for i in range(num_vars):
v = nc[f"var{i}"][t, 0, 0, 0]
v = None
# the close is done outside the for-loop to simulate simultaneously opened files
# as e.g. xarray.open_mfdataset
for nc in nc_list:
nc.close()
output:
$ python3 test_netcdf_memusage.py
after creation of files: 175MB, files: 8
modified chunk-cache: (16777217, 4133, 0.75)
memory-leak per file*variable with modified chunk-cache, netcdf4: 0MB
total: 175MB, files: 8
default chunk-cache: (16777216, 4133, 0.75)
memory-leak per file*variable with default chunk-cache, netcdf: 59MB
total: 1650MB, files: 8
So, most of the data is still cached in the chunk-cache, even after all files are closed and no data is held by python/numpy
Best regards, Heiko
Thanks! I'll take a look at this.
Looking at this, and the fix that was applied in xarray, I believe I see how we might be able to fix this. Thanks for the report, and your patience!
After further investigation, I'm at a loss as to how to address this; perhaps opening a discussion over at the netCDF-Python repository? I've tried to duplicate this issue in pure C, and have not been able to, nor have I been able to uncover any latent memory issues through static and dynamic testing. I'll keep trying, but in the meantime, my limited experience with/knowledge of Python means there isn't a lot I can do, immediately.
I've attached the test files; I've modified the provided python test script, and also the C version I was using for testing.
Thanks for looking into that. I've just compiled your c-program against 4.8.1 and am running it with the following output:
$ ./test_netcdf
after creation of files: 143MB
memory-leak per file*variable with modified chunk-cache, netcdf: 0MB
total: 143MB
memory-leak per file*variable with default chunk-cache, netcdf4: 8MB
total: 339MB
The same with the python version is:
$ python3 test_netcdf_memusage.py
after creation of files: 171MB, files: 4
modified chunk-cache: (16777217, 4133, 0.75)
memory-leak per file*variable with modified chunk-cache, netcdf4: 0MB
total: 171MB, files: 4
default chunk-cache: (16777216, 4133, 0.75)
memory-leak per file*variable with default chunk-cache, netcdf: 59MB
total: 1647MB, files: 4
I recognized a small difference between the python and the C version which is the hard-coded values of the cache. You set '16777216, 1000, 0.75' while the default cache of my version is according to nc_get_chunk_cache: '16777216, 4133, 0.75' I tried to modify adapt these values, but the result is still that I don't see the memory leak in the C version which is visible in the python version.
I appreciate you double-checking in your environment; are you able to open an issue over at netcdf4-python? I'd be happy to open one and link this to it, but you may be able to provide more relevant information given your understanding of Python and the underlying test case.
I notice your test with 4.8.1 still has an 8MB memory leak per file with the default chunk sizes; I was able to replicate this amongst various versions. It looks like the fix was introduced after the v4.9.2
release. I am currently working on the first release candidate for v4.9.3, so hopefully that should be able to get this solved!
While looking again at your program and I found an issue with your code reading var 0-5, while the data-variables varids are 4 to 8 (in my case). Updated version can be found in test_programs.zip
I changed
for (int t = 0; t < NUM_VARS; ++t) {
...
nc_get_var1_float(ncids[i], t, ....)
to
for (int t = 0; t < NUM_VARS; ++t) {
...
sprintf(var_name, "var%d", t);
nc_inq_varid(ncids[i], var_name, &varid);
nc_get_vara(ncids[i], varid, index, count, &value);
and I see the same memory consumption as with the python version:
$ ./test_netcdf
default chunk_cache: 16777216 4133 0.750000
after creation of files: 143MB
total before closing files: 143MB
memory-leak per file*variable with modified chunk-cache, netcdf: 0MB
total: 143MB
total before closing files: 1618MB
memory-leak per file*variable with default chunk-cache, netcdf4: 59MB
total: 1618MB
I haven't checked v4.9.3 yet, but as far as I've seen, the default chunk-sizes have been change, and the test-files chunks are too big to fit into v4.9.3 cache? I hope you can re-open the issue. I will close the netCDF4-python issue.
I compiled now the latest netcdf-version from git and I don't see the huge memory-consumption any longer, even after looking to adapt the new chunk-sizes. This solves this issue.
Looking at the new code for nc4_adjust_var_cache
and considering that CHUNK_CACHE_SIZE
== DEFAULT_CHUNK_CACHE_SIZE
(at least according to my config.h):
if (var->chunkcache.size == CHUNK_CACHE_SIZE)
if (chunk_size_bytes > var->chunkcache.size)
{
var->chunkcache.size = chunk_size_bytes * DEFAULT_CHUNKS_IN_CACHE;
if (var->chunkcache.size > DEFAULT_CHUNK_CACHE_SIZE)
var->chunkcache.size = DEFAULT_CHUNK_CACHE_SIZE;
if ((retval = nc4_reopen_dataset(grp, var)))
return retval;
}
This code block will never change var->chunkcache.size
(it is made > CHUNK_CACHE_SIZE and therefore reset to CHUNK_CACHE_SIZE), so any automatic adjustment of the chunk-cache was disabled from 4.9.3 and nc4_adjust_var_cache dead code, as far as I understand?
That is correct, and thanks!