netcdf4-python
netcdf4-python copied to clipboard
Reopening a netCDF file on disk (without closing the original) does not obtain fresh data
Users find it surprising when re-opening a modified file does not show new data.
This issue from Xarray seems to also be a netCDF4-Python issue: https://github.com/pydata/xarray/issues/4862
import shutil
import netCDF4
import numpy as np
with netCDF4.Dataset('my_data.nc', 'w') as nc:
nc.createDimension('x', 3)
var = nc.createVariable('foo', int, ('x',))
var[:] = [1, 2, 3]
with netCDF4.Dataset('my_data2.nc', 'w') as nc:
nc.createDimension('x', 3)
var = nc.createVariable('foo', int, ('x',))
var[:] = [4, 5, 6]
# open original data
nc = netCDF4.Dataset('my_data.nc')
original = nc.variables['foo'][:]
print(original)
# modify the file in place
shutil.copy('my_data2.nc', 'my_data.nc')
# reopen dataset, which should *not* match the original data
nc = netCDF4.Dataset('my_data.nc')
changed = nc.variables['foo'][:]
print(changed)
assert not np.array_equal(original, changed)
prints:
[1 2 3]
[1 2 3]
and then the assertion fails
I think the issue is that the nc object is not getting closed, doing this seems to pass your test
....
# open original data
nc = netCDF4.Dataset('my_data.nc')
original = nc.variables['foo'][:]
nc.close()
...
and it prints:
[1 2 3]
[4 5 6]
Yes, the original file is intentionally not closed in this example. Users still find it surprising that separately open versions of the same file do not reopen it from scratch and show the modified data.
Hmm - interesting
Seems to only occur if HDF5 is the underlying file format, if you add format="NETCDF3_64BIT" when the datasets are created then the test passes. This suggests it's due to the underlying C libs, not the python interface. However, the issue does not occur if import h5netcdf.legacyapi as netCDF4 is used.
still no idea why this occurs or how to fix it. I'll leave this issue open in case someone has something to contribute.