datacube-core alignment load hint documentationi is incorrect

Expected behaviour

Data loaded from ODC should be aligned with the original one.

Actual behaviour

I am working with data projected in EPSG:4326:

Size is 4332, 2000
Origin = (4.003000000000000,49.000000000000000)
Pixel Size = (0.003000000000000,-0.003000000000000)

Unfortunately, when loading the data with ODC, the data has one additional row and column, leading to a shift in the data. I attach a zip with a sample file and the product and dataset definitions. This screenshot shows the misalignment between the resulting data loaded from ODC vs the original one (layer below in dark gray is ODC):

I also tried using the align parameter but nothing changed, I am not sure how to properly use it. I also tried to use the load hints, but again no difference.

Steps to reproduce the behaviour

import datacube
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE')
print(data)

<xarray.Dataset>
Dimensions:      (time: 1, latitude: 2001, longitude: 4333)
Coordinates:
  * time         (time) datetime64[ns] 2022-12-16T10:09:22
  * latitude     (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
  * longitude    (longitude) float64 4.003 4.006 4.01 4.013 ... 16.99 17.0 17.0
    spatial_ref  int32 4326
Data variables:
    band_0       (time, latitude, longitude) float64 0.0 0.0 0.0 ... nan nan 0.0
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

data.geobox.extent.boundingbox

BoundingBox(left=4.002, bottom=42.999, right=17.001, top=49.002)

Environment information

Datacube version = 1.8.9 Python version = 3.9.13

sample_data.zip

Jan 16 '23 15:01 clausmichele

I feel like I've seen something like this before, but can't remember the details. I suspect there's a mismatch in assumptions about cell coordinates (edge vs centre) somewhere.

Indeed, the offset in the attached image looks like it is off by half-a-pixel rather than a whole pixel.

Jan 16 '23 22:01 SpacemanPaul

@clausmichele it's most likely doing the right thing, but documentation on the topic is lacking, please read through this issue first:

https://github.com/opendatacube/odc-stac/issues/95

Jan 17 '23 01:01 Kirill888

I know that the coordinates of the data I'm seeing are the pixel centres and not the boundaries, and I don't care if there's a padding but the problem is that the data is not aligned with the original one. Could you please try with the data I provided? I will change the title of this issue to reflect the problem.

Jan 17 '23 08:01 clausmichele

but that's the point, it won't be aligned with the original data. It's simply not possible in the general case when there are more than one dataset, or more than one band even...

To use exactly the grid you need you have to use like=GeoBox(...). "Native load" is not something dc.load supports even in situations where it should be possible (1 band of 1 dataset, for example). Trying to use align=(...) parameter is just an exercise in frustration and confusion, best option is to construct exact GeoBox you want and then pass it in like= parameter.

Jan 17 '23 12:01 Kirill888

also you should remove "storage" section from you product spec.

Jan 17 '23 12:01 Kirill888

Thanks @Kirill888, using like + GeoBox solved the issue for me! Anyway, I didn't find some good docs about this, I guess it's a recurrent issue many will encounter.

import datacube
from datacube.utils.geometry import intersects, GeoBox
import affine

ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE',like=ref_geobox)
print(data)

<xarray.Dataset>
Dimensions:      (time: 1, latitude: 2000, longitude: 4332)
Coordinates:
  * time         (time) datetime64[ns] 2022-12-16T10:09:22
  * latitude     (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
  * longitude    (longitude) float64 4.005 4.008 4.011 ... 16.99 16.99 17.0
    spatial_ref  int32 4326
Data variables:
    band_0       (time, latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general? Because I didn't see deprecation warnings when using it and it's also still in the docs https://datacube-core.readthedocs.io/en/latest/installation/product-definitions.html?highlight=product

Jan 17 '23 13:01 clausmichele

For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general?

@clausmichele In general, this definition is incomplete, it's missing tile_size, so it would have zero impact. See code snippet below, storage section maps to .grid_spec, but since it's incomplete it's as if it's not even there

https://github.com/opendatacube/datacube-core/blob/7f587ef075e30c15bb81e14c6dfa14d3080d337a/datacube/model/init.py#L516-L521

storage section was originally there for ingested products, i.e. products that are regularly tiled. It's not enough to have uniform resolution and projection across datasets, there is also an expectation for each dataset to have the same size in pixels and for any 2 datasets to overlap either fully or not at all.

You want to use "load hints" instead to supply default projection and resolution to use during dc.load from the product

https://datacube-core.readthedocs.io/en/latest/about-core-concepts/products.html#product-definition-api

errors in docs

@SpacemanPaul @pindge docs for product storage section missing essential tile_size parameter, and also have indentation issues in yaml:

https://github.com/opendatacube/datacube-core/blob/develop/docs/installation/product-definitions.rst

Jan 17 '23 22:01 Kirill888

@Kirill888 I did some tests using load instead of storage, but I can't understand how it is supposed to work, since the values does not seem to follow a logic:

So, starting point, using like + GeoBox gives the right coordinates:

from datacube.utils.geometry import GeoBox
import affine
import rioxarray

ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,like=ref_geobox,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045  4.0075  4.0105 ... 16.9915 16.9945 16.9975]

Then, here a the results using different align parameters:

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0
#         latitude: 0.0

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0015
#         latitude: 0.0015

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.999 48.996 48.993 ... 43.005 43.002 42.999]
[ 4.002  4.005  4.008 ... 16.992 16.995 16.998]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.003
#         latitude: 0.003

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

-> wrong

import datacube

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.001
#         latitude: 0.001

dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)

[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045  4.0075  4.0105 ... 16.9915 16.9945 16.9975]

-> correct!

But before finding this values, I also tried 0.002, 0.0005 and many more. So what's the logic behind? The right align values should be 0.0015 in my opinion, looking at the docs (half the resolution, which here is 0.003)

align.{x,y} (optional)

By default the pixel grid is aligned such that pixel boundaries fall on x,y axis. This option allows us to translate the pixel grid. For example, to ensure that the pixel center of a 30m pixel grid is coincident with 0,0 use align:{x:15,y:15}.

Jan 18 '23 12:01 clausmichele

This issue shouldn't be closed yet. It is not yet clear why the load + align parameters in the product definition give the showed outputs.

Jan 19 '23 08:01 clausmichele

a val in coord list = align + (n + 0.5) * resolution (where n is an integer.)

The +0.5 comes from converting from grid-edge coordinates to pixel-centre coordinates.

E.g. for:

# load:
#     crs: EPSG:4326
#     resolution:
#         longitude: 0.003
#         latitude: -0.003
#     align:
#         longitude: 0.0
#         latitude: 0.0

[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035  4.0065  4.0095 ... 16.9935 16.9965 16.9995]

49.0005 = 0.0 + (n+0.5) * 0.003 (n=16333, an integer)

I think all the numbers you see as "wrong" work out with that.

Jan 19 '23 22:01 SpacemanPaul

@clausmichele for a coordinate of a pixel EDGE x, resolution r and alignment argument a, we have the following

0 <= a < |r|, where |..| is absolute value operator
x == (N*r + a), where N is an integer

So every pixel spans from (N*r + a) to ((N+1)*r + a) with the center being N*r + a + r/2. So if x' = ds.x[0] (left most coordinate recorded in xarray), you should expect (x' - r/2 - a)/r to be pretty close to some integer value. Obviously floating point issues are likely when working with resolution like 0.003.

Jan 23 '23 00:01 Kirill888

x == (N*r + a), where N is an integer

So, if a = 0 and I know x, which is the pixel edge and I can find N. In my case, x = 49.000, r = -0.003 -> N = x/r = 49/-0.003 = -16333.(3), we round the result to get an integer and we get N = -16333.

Now that I know N, if I want my pixel edge to be at 49.00:

(N*r + a) = 49.00 -> a = 49.00 - N*r -> a = 49.00 - (-16333*-0.003)-> a = 0.001

Maybe all of this should be documented better in the ODC docs? This rounding to integer (when computing N) is what creates the unexpected pixel center. Probably a good example could avoid other people having the same doubts that I solved thanks to you!

Edit: is the rounding to get N always down to the closest integer?

Feb 02 '23 15:02 clausmichele

There's a lot that could be better documented in the ODC docs. Documentation PRs are most welcome!

Feb 06 '23 04:02 SpacemanPaul

Re: load v storage in product metadata.

Storage should be deprecated - use load. The exact behaviour of storage at the moment is complicated, but it almost definitely doesn't do what you want. Yes, the documentation is wildly incorrect.

Jun 02 '23 06:06 SpacemanPaul

datacube-core datacube-core copied to clipboard

alignment load hint documentationi is incorrect

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Environment information

errors in docs

datacube-core
datacube-core copied to clipboard