netcdf4-python
netcdf4-python copied to clipboard
2D string variable update causes HDF Error when rereading file
Hello,
I encountered a problem when trying to save a string variable. The goal is to update part of a dataset, so I am using a slice to select relevant part of a 2D-string table, and then assign the new values. While it works well for integer and floating variables, the 'partial' update of a string variable does not go well and raises an HDFError when rereading (see the image after the reproducing steps).
The problem might be linked to the combination of the extensible dimension feature and the 2D case because:
- replacing the
dim_0=Nonebydim_0=5--> OK - Partial update of 1D string variable --> OK
Here are the steps to reproduce:
with netCDF4.Dataset('broken.nc', mode='w') as handler:
handler.createDimension("dim_0", None)
handler.createDimension("dim_1", 5)
handler.createVariable('var_str', str, ('dim_0', 'dim_1'), fill_value='no_data')
handler["var_str"][2:5, 1:4] = np.full((3, 3), fill_value='foo', dtype=object)
# Error appears when triggering a netcdf close. Something might be getting corrupted somewhere
with netCDF4.Dataset('broken.nc', mode='r') as handler:
print(handler["var_str"][...])
I work in a Conda environment installed on RHEL8 with : python=3.11 h5netcdf=1.2.0 libnetcdf=4.9.2 netcdf4=1.7.1
since it works if you used fixed dimensions, it's likely a bug in the netcdf-c lib
I recently got similar errors, too. They appeared when I attempted to upgrade the netcdf4 version to 1.7.1. Before, my code was running on version 1.5.8 without any errors.
netcdf4-python 1.5.8 wheels used an earlier version of the C lib (nothing in the python interface for vlen str variables has changed)
Should I reopen this issue in the netcdf-c repository instead ?
I think that would be a good idea - especially if you could translate your example into C and include that in the github issue.
you don't have to close this issue - just link it to the one in the netcdf-c repo