netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

multi character FillValue for S1 variables

Open johnmund opened this issue 5 years ago • 7 comments

python 3.7 netcdf4 (1.5.1.2) We are struggling to update some old python2.7 code that used 'c' variable types for char arrays. When creating a variable of type 'S1' is it possible to have a fill value of 'NA'?

#!/usr/bin/env python3 import netCDF4 import numpy nc = netCDF4.Dataset('stringtest.nc','w',format='NETCDF4') nc.createDimension('nchars',3) nc.createDimension('nstrings',3) v = nc.createVariable('strings','S1',('nstrings','nchars'),fill_value="NA") v._Encoding = 'ascii' v[0]='FOO' v[1]='BAR' print("Fill Value:",v._FillValue) print("strings var:",v[:])

#./test.py Fill Value: b'N' strings var: ['FOO' 'BAR' 'NNN']

Is this expected behavior? We could switch to str variables but are unsure of the potential ramifications for downstream users.

johnmund avatar Mar 19 '20 20:03 johnmund

The fill_value has to be of the same type as the variable, so you can't have a two character fill_value for a single character variable.

There is some magic happening in the python interface to convert a nstrings by nchars array of char variables to an array of strings of length nchars. It's stored as single chars in the netcdf file though.

jswhit avatar Mar 19 '20 21:03 jswhit

Thank you for the quick reply! That makes sense.

Is this a change in behavior or are we not converting our legacy code correctly? It used to work (in prior versions of python/netcdf) when using datatype 'c', but that now gives an error "AttributeError: NetCDF: Invalid argument". Is there a different way to initialize a character array that would allow a 2 character fill_value or is the only option to use datatype str?

johnmund avatar Mar 19 '20 23:03 johnmund

Can't see how it every would have worked - the C library requires that the fill_value be the same type as the variable. It would like trying to use a 64-bit integer fill_value for 16-bit integer variable - it just won't fit.

jswhit avatar Mar 19 '20 23:03 jswhit

I guess you could manually pre-fill your array with "N"s, "A"s and blanks so the rows of the matrix are "NA ".

jswhit avatar Mar 19 '20 23:03 jswhit

Here's what I mean:

import netCDF4
nc = netCDF4.Dataset('stringtest.nc','w',format='NETCDF4')
nc.createDimension('nchars',3)
nc.createDimension('nstrings',3)
v = nc.createVariable('strings','S1',('nstrings','nchars'))
v._Encoding = 'ascii'
v[:]='NA '
v[0]='FOO'
v[1]='BAR'
nc.close()

which produces

netcdf stringtest {
dimensions:
	nchars = 3 ;
	nstrings = 3 ;
variables:
	char strings(nstrings, nchars) ;
		strings:_Encoding = "ascii" ;
data:

 strings =
  "FOO",
  "BAR",
  "NA " ;
}

jswhit avatar Mar 21 '20 02:03 jswhit

Thank you for the example and suggestions. We'll either do as you suggest to pre-fill or change/remove the custom fill value.

It is odd though, because we certainly had NA fill values in prior versions. I was able to recreate the following using 'c' datatype in netCDF4 4.1.1 on an old python install


$ python --version Python 2.6.6

$ cat test.py #!/usr/bin/env python import netCDF4 import numpy nc = netCDF4.Dataset('stringtest.nc','w',format='NETCDF4') nc.createDimension('nchars',3) nc.createDimension('nstrings',3) v = nc.createVariable('strings','c',('nstrings','nchars'),fill_value='NA') print("Fill Value:",v._FillValue)

$ python test.py ('Fill Value:', u'NA')


It doesn't really matter what it used to do though, we need to make it work with the current code base so thank you for your suggestions.

johnmund avatar Mar 22 '20 19:03 johnmund

I think there may have been a change in the C lib to enforce _FillValue (and missing_value) having the same type as the variable.

jswhit avatar Mar 28 '20 13:03 jswhit