datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Important fix proposal: Implement setting "nan" nodata value for floating point rasters with no nodata attribute

Open robbibt opened this issue 2 months ago • 14 comments

Hi all, as discussed internally within DEA, we have recently encountered an issue where floating point rasters are not displayed correctly in ESRI software due to the lack of an explicitly set nodata attribute of "nan". This differs from open software such as QGIS and GDAL, which correctly interpret nodata by implicitly treating a missing nodata attribute as equivalent to "nan" (edit: this turns out to not be the case: GDAL overview generation is also affected by this issue).

While this issue has been resolved by patching previously generated data on prod S3, a more sustainable long-term fix would be to set nodata attributes in our GeoTIFF writing code, to ensure that any floating point raster without a custom nodata attribute is assigned a nodata attribute of "nan".

The datacube.utils.cog.write_cog and datacube.utils.cog.to_cog functions are important places to fix this, as they are used in the generation of almost all ODC products. A possible fix would likely involve a minor change similar to psuedocode below somewhere here:

if (raster is floating point dtype) AND (no nodata attribute set):
    raster.nodata = "nan"

The alternative to fixing it in GeoTIFF writing code would be to make sure all downstream product worklows manually set nodata values. However, in my opinion this would be difficult to document and would be highly likely to be missed in the future, leading to more expensive remediation. Fixing it at the source seems like a better option, especially as extensive testing has revealed no downsteam impacts of the change while successfully solve the issue for ESRI users (a large proportion of our user base).

robbibt avatar Jun 05 '24 04:06 robbibt