netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

scale factor and offset when appending to variable

Open cpaulik opened this issue 11 years ago • 4 comments

Hi,

I'm not sure it this is a bug, but for me this was not expected behaviour. If I append to an existing variable in a netCDF file that has a scale_factor and/or an offset attribute, the library applies them to my data before putting it into the variable.

I assumed that I have to write the packed data into the file and the scale factor and add_offset are only applied when reading the data.

The following program illustrates the behaviour:

import netCDF4
import numpy as np

with netCDF4.Dataset('/media/sf_D/ubyte_test.nc', 'w') as dataset:
    dataset.createDimension('array', None)
    data = dataset.createVariable('data', np.float, ('array',))
    data[:] = np.arange(5)
    data.scale_factor = 0.5
    data.add_offset = 5

with netCDF4.Dataset('/media/sf_D/ubyte_test.nc', 'a') as dataset:
    dataset.variables['data'][5:] = np.arange(5)

with netCDF4.Dataset('/media/sf_D/ubyte_test.nc') as dataset:
    data = dataset.variables['data'][:]
    print data

It outputs

[5. 5.5 6. 6.5 7. 0. 1. 2. 3. 4.]

If this is not considered a bug, I think it would be good to add to the documentation that scale factor and offset have to be specified before writing data and that the unpacked data should be written to be consistent.

Best Regards, Christoph

cpaulik avatar Jun 24 '14 17:06 cpaulik

See

http://unidata.github.io/netcdf4-python/netCDF4.Variable-class.html#set_auto_maskandscale

You're not the first to be surprised by this - at the very least we should make this behaviour more prominent in the docs.

jswhit avatar Jun 24 '14 17:06 jswhit

I was also surprised by this behavior, since it is very different from scipy.io.netcdf_file. I spent a long time trying to hunt down a bug due to this.

rabernat avatar Oct 05 '14 16:10 rabernat

scipy.io.netcdf_file can maskandscale, too, and even when appending, it seems?

But this is a somewhat recent enhancement, I assume, but such a time and code saver (alson in netCDF4)!

j08lue avatar Jun 18 '16 19:06 j08lue

I am not sure whether this documented well enough now (here: http://unidata.github.io/netcdf4-python/#netCDF4.Dataset.set_auto_maskandscale) for closing this issue?

Maybe auto_mask, auto_scale, and auto_maskandscale should be parameters on __init__ of Datasets, variables, etc., like it is for scipy.io.netcdf. Then people would become more aware of this function, too.

What seems to be missing in the docs in any case is whether auto_maskandscale is True per default on newly created/opened datasets.

j08lue avatar Jun 18 '16 19:06 j08lue