h5pyd
h5pyd copied to clipboard
xarray can't identify time units in HSDS dataset
In this notebook
https://gist.github.com/rsignell-usgs/07143a5ab54afb8ad6eb1af255d025c9
we use xarray
to open a local netcdf4 file and then the same dataset that was 'hsload'ed to hsds
.
xarray
automatically recognized the CF-compliant time units and converts the time coordinate to datetime
so that the plot is correctly labeled in cell [6].
But time is not recognized for the the HSDS dataset plot in cell [5].
Any idea what the problem is?
Would it be possible to print out what xarray thinks of that variable from the two sources? Have two cells with ds2['TMP_2maboveground']
and ds['TMP_2maboveground']
.
@ajelenak-thg , yes, it looks like HSDS is dropping the variable attributes: https://gist.github.com/rsignell-usgs/dbe88df42e1181827363a8348016f28b
BTW, you should be able to run this notebook (at least the HSDS and DAP access cells) -- you just need a username and password for this XSEDE endpoint from @jreadey in your ~/.hscfg
, right?
If you sign up for XSEDE I can add you to my project, in case that becomes useful later on.
Seems like some attributes of the time
coordinate got lost "in translation" to HSDS. According to the DAS response from the THREDDS server:
time {
String units "seconds since 1970-01-01 00:00:00.0 0:00";
String long_name "verification time generated by wgrib2 function verftime()";
Float64 reference_time 1.4832288E9;
Int32 reference_time_type 0;
String reference_date "2017.01.01 00:00:00 UTC";
String reference_time_description "kind of product unclear, reference date is variable, min found reference date is given";
String time_step_setting "auto";
Float64 time_step 3600.0;
Int32 _ChunkSizes 512;
}
_ChunkSizes
does not really exist as an attribute, I think, because netCDF tools typically display HDF5 dataset creation properties as system attributes (prefixed with _
).
The HSDS response about the attributes of the time
coordinate shows only these (HDF5 dimension scale-related attributes not included): _Netcdf4Dimid
, reference_time
, reference_time_type
, time_step
. No units
attribute so no conversion to datetime
.
@ajelenak-thg and @jreadey, yes, HSDS is losing nearly all variable attributes!
The variable in the original NC file has attributes:
float TMP_2maboveground(time, latitude, longitude) ;
TMP_2maboveground:_FillValue = 9.999e+20f ;
TMP_2maboveground:least_significant_digit = 2 ;
TMP_2maboveground:short_name = "TMP_2maboveground" ;
TMP_2maboveground:long_name = "Temperature" ;
TMP_2maboveground:level = "2 m above ground" ;
TMP_2maboveground:units = "K" ;
while in HSDS, the only remaining attribute is:
Attributes:
least_significant_digit: [2]
Does this mean perhaps that HSDS is only handing attributes with integer values or something?
@rsignell-usgs - do you see any errors during the import (with hsload)?
I've seen this issue: https://github.com/h5py/h5py/issues/719 come up when loading NetCDF files.
Oh yes, I got tons of errors on hsload
.
Looks like the real problem is here: https://github.com/h5py/h5py/issues/719#issuecomment-238070297 :
The issue here is that recent versions of netCDF-C save the NC_CHAR dtype as fixed length UTF8 strings, which h5py cannot read.
So maybe hsload
could translate NC_CHAR
dtypes
into something that h5pyd
can read?
I don't know if it is possible to get the bytes for such attributes somehow and avoid h5py until that issue is resolved.
I have just created a PR with a fix for this problem: h5py/h5py#988. It works for the netCDF file used here. Let's what happens.