netcdf4-python
netcdf4-python copied to clipboard
scale factor and offset when appending to variable
Hi,
I'm not sure it this is a bug, but for me this was not expected behaviour. If I append to an existing variable in a netCDF file that has a scale_factor and/or an offset attribute, the library applies them to my data before putting it into the variable.
I assumed that I have to write the packed data into the file and the scale factor and add_offset are only applied when reading the data.
The following program illustrates the behaviour:
import netCDF4
import numpy as np
with netCDF4.Dataset('/media/sf_D/ubyte_test.nc', 'w') as dataset:
dataset.createDimension('array', None)
data = dataset.createVariable('data', np.float, ('array',))
data[:] = np.arange(5)
data.scale_factor = 0.5
data.add_offset = 5
with netCDF4.Dataset('/media/sf_D/ubyte_test.nc', 'a') as dataset:
dataset.variables['data'][5:] = np.arange(5)
with netCDF4.Dataset('/media/sf_D/ubyte_test.nc') as dataset:
data = dataset.variables['data'][:]
print data
It outputs
[5. 5.5 6. 6.5 7. 0. 1. 2. 3. 4.]
If this is not considered a bug, I think it would be good to add to the documentation that scale factor and offset have to be specified before writing data and that the unpacked data should be written to be consistent.
Best Regards, Christoph
See
http://unidata.github.io/netcdf4-python/netCDF4.Variable-class.html#set_auto_maskandscale
You're not the first to be surprised by this - at the very least we should make this behaviour more prominent in the docs.
I was also surprised by this behavior, since it is very different from scipy.io.netcdf_file. I spent a long time trying to hunt down a bug due to this.
scipy.io.netcdf_file can maskandscale, too, and even when appending, it seems?
But this is a somewhat recent enhancement, I assume, but such a time and code saver (alson in netCDF4)!
I am not sure whether this documented well enough now (here: http://unidata.github.io/netcdf4-python/#netCDF4.Dataset.set_auto_maskandscale) for closing this issue?
Maybe auto_mask, auto_scale, and auto_maskandscale should be parameters on __init__ of Datasets, variables, etc., like it is for scipy.io.netcdf. Then people would become more aware of this function, too.
What seems to be missing in the docs in any case is whether auto_maskandscale is True per default on newly created/opened datasets.