netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Reopening a netCDF file on disk (without closing the original) does not obtain fresh data

Open shoyer opened this issue 3 years ago • 5 comments

Users find it surprising when re-opening a modified file does not show new data.

This issue from Xarray seems to also be a netCDF4-Python issue: https://github.com/pydata/xarray/issues/4862

import shutil
import netCDF4
import numpy as np

with netCDF4.Dataset('my_data.nc', 'w') as nc:
  nc.createDimension('x', 3)
  var = nc.createVariable('foo', int, ('x',))
  var[:] = [1, 2, 3]

with netCDF4.Dataset('my_data2.nc', 'w') as nc:
  nc.createDimension('x', 3)
  var = nc.createVariable('foo', int, ('x',))
  var[:] = [4, 5, 6]

# open original data
nc = netCDF4.Dataset('my_data.nc')
original = nc.variables['foo'][:]
print(original)

# modify the file in place
shutil.copy('my_data2.nc', 'my_data.nc')

# reopen dataset, which should *not* match the original data
nc = netCDF4.Dataset('my_data.nc')
changed = nc.variables['foo'][:]
print(changed)

assert not np.array_equal(original, changed)

prints:

[1 2 3]
[1 2 3]

and then the assertion fails

shoyer avatar Sep 27 '22 16:09 shoyer

I think the issue is that the nc object is not getting closed, doing this seems to pass your test

....
# open original data
nc = netCDF4.Dataset('my_data.nc')
original = nc.variables['foo'][:]
nc.close()
...

and it prints:

[1 2 3]
[4 5 6]

akrherz avatar Sep 27 '22 16:09 akrherz

Yes, the original file is intentionally not closed in this example. Users still find it surprising that separately open versions of the same file do not reopen it from scratch and show the modified data.

shoyer avatar Sep 27 '22 16:09 shoyer

Hmm - interesting

jswhit avatar Sep 27 '22 17:09 jswhit

Seems to only occur if HDF5 is the underlying file format, if you add format="NETCDF3_64BIT" when the datasets are created then the test passes. This suggests it's due to the underlying C libs, not the python interface. However, the issue does not occur if import h5netcdf.legacyapi as netCDF4 is used.

jswhit avatar Sep 27 '22 17:09 jswhit

still no idea why this occurs or how to fix it. I'll leave this issue open in case someone has something to contribute.

jswhit avatar Oct 11 '22 21:10 jswhit