datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Using NetCDF files

Open lucapaganotti opened this issue 2 years ago • 5 comments

Expected behaviour

When I define a product and a dataset and then query data via a simple jupiter notebook I would like to see the same coordinate values defined in the .nc file

Actual behaviour

When I define a product and a dataset and then query data via a simple jupiter notebook I don't s see the same coordinate values defined in the .nc file

... eg. Incorrect result, python stacktrace, etc. ... comparing coordinates values in EPSG:32632 I don't get the same values contained in the .nc file

script output:

***** DATASETS
[Dataset <id=94cf3426-6200-44c8-b1b2-f4e37e850791 product=c_o3_no_z location=file:///home/buck/dev/odc/data/netcdf/c_O3_dataset.odc-metadata.yaml>]

***** PRODUCTS
[{'name': 'c_o3_no_z', 'description': 'ARPA Ozone cover.', 'license': None, 'default_crs': CRS('EPSG:32632'), 'default_resolution': (4000.0, 4000.0), 'dataset_count': 1}]

***** MEASUREMENTS
[{'product': 'c_o3_no_z', 'measurement': 'c_o3', 'name': 'c_o3', 'dtype': 'float64', 'units': 'ppb', 'nodata': -9999, 'aliases': ['c_O3']}]

***** RESULTS
<xarray.Dataset>
Dimensions:      (time: 1, y: 59, x: 61)
Coordinates:
  * time         (time) datetime64[ns] 2015-04-10T13:30:00
  * y            (y) float64 4.942e+06 4.946e+06 4.95e+06 ... 5.17e+06 5.174e+06
  * x            (x) float64 4.58e+05 4.62e+05 4.66e+05 ... 6.94e+05 6.98e+05
    spatial_ref  int32 32632
Data variables:
    c_O3         (time, y, x) float64 51.42 53.28 47.53 ... 6.966 17.75 27.7
Attributes:
    crs:           EPSG:32632
    grid_mapping:  spatial_ref

Steps to reproduce the behaviour

... Include code, command line parameters as appropriate ...

The python script I'm using is:

import datacube
from datacube.api.query import Query, query_group_by, query_geopolygon

dc = datacube.Datacube(app="arpa_analysis")

query = Query(product='c_o3_no_z', time=('2015-04-10T02:00:00', '2015-04-11T01:00:00'))

datasets = dc.find_datasets(**query.search_terms)
print('\n***** DATASETS')
print(datasets)

products = dc.list_products(False, True)
print('\n***** PRODUCTS')
print(products)

measurements = dc.list_measurements(True, False)
print('\n***** MEASUREMENTS')
print(measurements)

result = dc.load(product='c_o3_no_z',
        measurements=['c_O3']
        )
print('\n***** RESULTS')
print(result)

dc.close()

Environment information

  • Which datacube --version are you using?

  • Open Data Cube core, version 1.8.6

  • What datacube deployment/enviornment are you running against?

  • Following the guides I've setup a conda environment using python 3.8.13 and using jupiter notebooks

I've defined a product and a dataset via two yaml files the are attached to this issue (that maybe not an issue ... but a fault of mine ...)

test_no_zeta-ncdump.txt c_O3_product.odc-metadata.txt c_O3_dataset.odc-metadata.txt

I can't attach here the .nc binary file but it can be generated with ncgen. Also I had to change files' extension for .yaml files to .txt.

I the run the following commands via a reinit.sh script:

#!/bin/bash
echo 'Drop datacube db'
dropdb datacube
echo 'Create datacube db'
createdb -h localhost -U buck datacube
echo 'System init'
datacube -v system init

echo 'Add product c_O3'
datacube product add c_O3_product.odc-metadata.yaml 
echo 'Add dataset c_O3'
datacube dataset add c_O3_dataset.odc-metadata.yaml

all run fine except some performance warnings:

(odc_env) $ reinit.sh 
Drop datacube db
Create datacube db
System init
2022-06-07 12:30:52,414 90408 datacube INFO Running datacube command: /home/buck/anaconda3/envs/odc_env/bin/datacube -v system init
Initialising database...
2022-06-07 12:30:53,590 90408 datacube.drivers.postgres._core INFO Ensuring user roles.
2022-06-07 12:30:53,735 90408 datacube.drivers.postgres._core INFO Creating schema.
2022-06-07 12:30:53,737 90408 datacube.drivers.postgres._core INFO Creating tables.
2022-06-07 12:30:54,574 90408 datacube.drivers.postgres._core INFO Creating triggers.
2022-06-07 12:30:54,642 90408 datacube.drivers.postgres._core INFO Creating added column.
2022-06-07 12:30:54,665 90408 datacube.drivers.postgres._core INFO Adding role grants.
2022-06-07 12:30:54,672 90408 datacube.index.index INFO Adding default metadata types.
/home/buck/anaconda3/envs/odc_env/lib/python3.8/site-packages/datacube/drivers/postgres/_dynamic.py:50: SAWarning: Class CreateView will not make use of SQL compilation caching as it does not set the 'inherit_cache' attribute to ``True``.  This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions.  Set this attribute to True if this object can make use of the cache key generated by the superclass.  Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
  conn.execute(
Created.
Checking indexes/views.
2022-06-07 12:30:55,192 90408 datacube.drivers.postgres._api INFO Checking dynamic views/indexes. (rebuild views=True, indexes=False)
Done.
Add product c_O3
Adding "c_o3_no_z" (this might take a while)/home/buck/anaconda3/envs/odc_env/lib/python3.8/site-packages/datacube/drivers/postgres/_dynamic.py:50: SAWarning: Class CreateView will not make use of SQL compilation caching as it does not set the 'inherit_cache' attribute to ``True``.  This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions.  Set this attribute to True if this object can make use of the cache key generated by the superclass.  Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
  conn.execute(
 DONE
Add dataset c_O3
Indexing datasets  [####################################]  100%
(odc_env) $

If now I run the python script I get:

...
***** RESULTS
<xarray.Dataset>
Dimensions:      (time: 1, y: 59, x: 61)
Coordinates:
  * time         (time) datetime64[ns] 2015-04-10T13:30:00
  * y            (y) float64 4.942e+06 4.946e+06 4.95e+06 ... 5.17e+06 5.174e+06
  * x            (x) float64 4.58e+05 4.62e+05 4.66e+05 ... 6.94e+05 6.98e+05
    spatial_ref  int32 32632
...

comparing x and y coordinates I get a -1000 meters shift on y and a 1000 meters shift on x from the values I have in the nc dump file:

y = 4943000, 4947000, 4951000, ...
x = 457000, 461000, 465000, ...

I tried to use the grids:transform matrix but I do not understand how it works with netcdf files, further more for some changes in the transformation matrix coefficients I get no change at all on coordinates. I defined also the geometry:coordinates in the dataset metadata file but when I change its values I get then a change in the shape dimensions that are not anymore those of the .nc file ...

So, surely I'm making some mistakes, or a lot ... but I'm not understanding where the hack is; if someone could give some hints it would be greately appreciated.

Thank you.

lucapaganotti avatar Jun 07 '22 10:06 lucapaganotti

It's not about netcdf, it's due to pixel snapping. This data has 4000 crs units per pixel. dc.load by default chooses to place pixel edges at multiples of pixel size, shifting your data by 1000. Since default resampling is nearest you get the same data but labelled differently, if, say, you put resampling at average you'll see that pixel values are different too as everything got shifted by 1/4 of a pixel.

remember that datacube loads multiple datasets at once so it can't just pick "native projection" in a general case, so it doesn't bother doing it even when possible. To specify precisely what pixel grid you want on output you have to like=GeoBox(...) parameter.

See https://github.com/opendatacube/datacube-core/blob/82ed29c263eaf65f5c1fbb4e9207c99e9700b85c/datacube/testutils/io.py#L81

Kirill888 avatar Jun 07 '22 10:06 Kirill888

Other option is to us align=(1000, 1000) parameter to load and maybe add

load: 
...
  align:
    x: 1000
    y: 1000

to product spec to default to that alignment for that product.

Kirill888 avatar Jun 07 '22 11:06 Kirill888

also transform section for the dataset is not right, try rio info <file> and see what it reports, that's the value you want

Maybe this will help: https://odc-geo.readthedocs.io/en/latest/intro-geobox.html

looks like your data is not using inverted Y axis so I'd epxect things look like

[4000, 0, tx,
 0, 4000, ty,
 0,0,1

Kirill888 avatar Jun 07 '22 11:06 Kirill888

Hi Kirill888, thank you very much for your hints. I'll try to check them in the next days. Yes, maybe the transform section can be different from the output of the rio info command due to some tests that were made.

Thanks again.

lucapaganotti avatar Jun 07 '22 11:06 lucapaganotti

transform part in the dataset document is only used for things like figuring out native geoboxes. For actually loading data file metadata is always used. Keep in mind that coordinates in netcdf and in loaded data are for pixel centers, but translation in the transform part is for image edges, so don't forget to subtract 1/2 pixel from what you see in ncdump. Best is to use rio info it does all the right things and deals with flipped or not Y axis.

Kirill888 avatar Jun 07 '22 11:06 Kirill888

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 12 '22 09:10 stale[bot]