netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Apply scale and offset before checking valid_min/valid_max

Open knutfrode opened this issue 7 years ago • 8 comments

Data is in the latest versions masked when outside <valid_min, valid_max>. But apparently this does not work for variables where scale and offset is given, since the valid-checks seem to be performed on the raw (unscaled) data.

knutfrode avatar Aug 22 '17 13:08 knutfrode

This is intentional - the valid_min/valid_max checks are done on the integer data as it is actually stored in the netcdf file. This is consistent with missing_value and _Fill_Value, which also have the native (integer) type when scale/offset packing is used.

jswhit avatar Aug 22 '17 14:08 jswhit

Ok, that is good to know. But this then means that there are quite a few netCDF files "out there" where valid_min/valid_max mistakenly refer to the "scaled and offsetted" values, and not the integer values. One example (in my case) is the ROMS ocean model.

Thus a highly welcome feature would be the possibility to use a switch to perform scale/offset before the min/max checking (or rather, scale/offset the min/max-values before masking).

knutfrode avatar Aug 22 '17 14:08 knutfrode

I don't think it makes sense to specify valid_min/valid_max with a type that is different than the netcdf variable. What you're asking for is a floating point valid range for short integer data. I foresee all kinds of floating point comparison issues at the edge of the range.

Do these ROMS files specify the missing_value as a floating point type also?

jswhit avatar Aug 22 '17 16:08 jswhit

Her is an example of a variable which is returned as masked values with netCDF 1.3.0:

short Cs_w(s_w) ;
        Cs_w:long_name = "S-coordinate stretching curves at W-points" ;
        Cs_w:valid_min = -1. ;
        Cs_w:valid_max = 0. ;
        Cs_w:field = "Cs_w, scalar" ;
        Cs_w:scale_factor = -1.52597204419215e-05 ;
        Cs_w:add_offset = -0.5 ;

Cs_w should have values between -1 and 0, but the the scaled/offsetted values (shorts) are integers in the range <-32000,32000>, and are thus all being masked. Thus this is perhaps a post-processing issue: the file might have been packed afterwards (ncpdq?), and then the valid_min/max have not been updated as they should? After setting var.set_auto_mask(False) in my script, this variable is read properly as before.

However, for some reason this variable in the same file is being read fine:

short u(ocean_time, s_rho, eta_u, xi_u) ;
        u:long_name = "time-averaged u-momentum component" ;
        u:units = "meter second-1" ;
        u:time = "ocean_time" ;
        u:coordinates = "lon_u lat_u s_rho ocean_time" ;
        u:field = "u-velocity, scalar, series" ;
        u:_FillValue = 1.e+37f ;
        u:scale_factor = -2.378769e-05f ;
        u:add_offset = 0.3410598f ;

whereas e.g. this variable has a lot of masked values, which were not masked in earlier versions (e.g. 1.2.4) of netCDF-library:

short temp(ocean_time, s_rho, eta_rho, xi_rho) ;
        temp:long_name = "time-averaged potential temperature" ;
        temp:units = "Celsius" ;
        temp:time = "ocean_time" ;
        temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
        temp:field = "temperature, scalar, series" ;
        temp:_FillValue = 1.e+37f ;
        temp:scale_factor = -0.0001446465f ;
        temp:add_offset = 3.502546f ;

It seems like values above ca 3.5 (i.e. similar to offset value) are being masked, for some reason. But there is no valid_min/valid_max here?

knutfrode avatar Aug 22 '17 17:08 knutfrode

I see that the _FillValue is 1.e37 - this will get cast to the data type of the variable (short integer), resulting in all the zero values in the array being masked.

_FillValue, missing_value, valid_min, valid_max should all have the same data type as the netcdf variable. This is a bug in the ROMS files.

jswhit avatar Aug 22 '17 17:08 jswhit

Ok, then it starts making sense. Thank you for clarifying.

knutfrode avatar Aug 22 '17 18:08 knutfrode

I can see how this can be confusing and can create hard to debug errors. Perhaps we should refuse to do the auto-scaling and masking (and throw and exception instead) if the attribute data types don't match the variable data type.

jswhit avatar Aug 23 '17 15:08 jswhit

With pull request #708 a warning will be issued if the _FillValue cannot be safely cast to the variable data type, and it will not be used to mask the returned data.

jswhit avatar Sep 07 '17 21:09 jswhit