datacube-core
datacube-core copied to clipboard
nodata for product definition doesnt match dtype
related to: https://github.com/GeoscienceAustralia/dea-config/issues/1110
247 | ga_srtm_dem1sv1_0 | {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"} | 1 | {"name": "ga_srtm_dem1sv1_0", "storage": {"crs": "EPSG:4326", "resolution": {"latitude": -0.00027777777778, "longitude": 0.00027777777778}}, "metadata": {"platform": {"code": "Space Shuttle Endeavour"}, "instrument": {"name": "SIR"}, "product_type": "DEM"}, "description": "DEM 1sec Version 1.0", "measurements": [{"name": "dem", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_s", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}, {"name": "dem_h", "dtype": "float32", "units": "metre", "nodata": -340282350000000000000000000000000000000}], "metadata_type": "eo"} | 2020-09-25 04:29:16.760552+00 | ows | 2021-07-01 01:07:29.571853+00
(1 row)
eo3-validate
type handling needs update
Our validator is correctly failing the product --- should this issue go to ODC core?
@SpacemanPaul suggested this is not core's issue, it was just transferred here from core
I'm not that strongly opinionated on the matter.
What would the metadata fix be though? -340282350000000000000000000000000000000
-> -3.4028235e+38
?
What does a nodata value on a floating point variable even mean?
nodata supported type can be anything as per odc schema https://github.com/opendatacube/datacube-core/blob/develop/datacube/model/schema/dataset-type-schema.yaml#L97-L100
nodata:
oneOf:
- type: number
- enum: [NaN, Inf, -Inf]
-
dtype restriction https://github.com/opendatacube/datacube-core/blob/develop/datacube/model/schema/dataset-type-schema.yaml#L54-L55
https://github.com/opendatacube/datacube-core/pull/1347
What does a nodata value on a floating point variable even mean?
same as for an int
: bit pattern reserved to mean absence of valid observation, just cause float32
already has standard bit-patterns that mean exactly that, doesn't stop some from using some finite values instead and sometimes along with proper float-point NaNs. All masking code is significantly complicated due to float nodata
that could be nan
or some finite float value, or absent, which is the same as nan
.
The real value is probably: -340282346638528859811704183484516925440
which is numpy.finfo(numpy.float32).min
, but it lost precision when converting to string via scientific notation.
Our validator is correctly failing the product --- should this issue go to ODC core?
@jeremyh
test in numpy_value_fits_dtype
return np.all(np.array([value], dtype=dtype) == [value])
doesn't check what it claims, it checks that value can round-trip to numpy and back, and that's only appropriate for integer types. Value as supplied, is probably incorrect because of precision loss, but it is surely not outside of valid range for float32 number.
checking the dataset in middle of the ocean
import datacube
dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
product="ga_srtm_dem1sv1_0",
x=(x - buffer_x, x + buffer_x),
y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)
ds
the output of stored nodata
dem
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
...,
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_s
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
...,
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
dem_h
(time, latitude, longitude)
float32
-3.403e+38 ... -3.403e+38
array([[[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
...,
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38],
[-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, ...,
-3.4028235e+38, -3.4028235e+38, -3.4028235e+38]]], dtype=float32)
ds.dem[0][1][1]
xarray.DataArray'dem'
array(-3.4028235e+38, dtype=float32)
Coordinates:
time
()
datetime64[ns]
2014-12-15T14:58:44
latitude
()
float64
-37.23
longitude
()
float64
136.4
spatial_ref
()
int32
4326
Attributes:
units :
metre
nodata :
-340282350000000000000000000000000000000
crs :
EPSG:4326
grid_mapping :
spatial_ref
Example of datacube's mask_invalid_data
function failing to mask nodata for this product. If this worked as intended, I would expect to see all -3.4028235e+38
values in the dataset masked out and set to np.NaN
:
import datacube
from datacube.utils.masking import mask_invalid_data
dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
product="ga_srtm_dem1sv1_0",
x=(x - buffer_x, x + buffer_x),
y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)
# Attempt to mask invalid data
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked
Nodata values are not correctly replaced with np.NaN
:
@robbibt can you try patching .nodata
value on xarray variable to -340282346638528859811704183484516925440
and then try masking again? Also value above, might pass validation as it should round-trip to/from numpy without loss.
What about nodata
tags in the imagery itself, what are they set to? Also is nodata
coming from files or is it coming from load, load will place whatever incorrect rounded value recorded in the product spec, and then separate to that data itself might contain non-rounded values of nodata
.
@Kirill888 That works!
import datacube
from datacube.utils.masking import mask_invalid_data
dc = datacube.Datacube()
x = 136.43551
y = -37.25564
buffer_x = 0.03
buffer_y = 0.03
query = dict(
product="ga_srtm_dem1sv1_0",
x=(x - buffer_x, x + buffer_x),
y=(y - buffer_y, y + buffer_y),
)
ds = dc.load(**query)
# Attempt to mask invalid data
ds.dem_h.attrs['nodata'] = -340282346638528859811704183484516925440
ds_masked = mask_invalid_data(ds.dem_h)
ds_masked
What about nodata tags in the imagery itself
Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?
Not sure what you mean by this, do you mean the nodata attributes on the different bands of the dataset?
Is there nodata
marker on the TIFF image itself (rio info path_to_image
), and is it set to a correct value or is it truncated, like in the product definition.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.