datacube-core
datacube-core copied to clipboard
alignment load hint documentationi is incorrect
Expected behaviour
Data loaded from ODC should be aligned with the original one.
Actual behaviour
I am working with data projected in EPSG:4326:
Size is 4332, 2000
Origin = (4.003000000000000,49.000000000000000)
Pixel Size = (0.003000000000000,-0.003000000000000)
Unfortunately, when loading the data with ODC, the data has one additional row and column, leading to a shift in the data. I attach a zip with a sample file and the product and dataset definitions. This screenshot shows the misalignment between the resulting data loaded from ODC vs the original one (layer below in dark gray is ODC):
I also tried using the align
parameter but nothing changed, I am not sure how to properly use it. I also tried to use the load hints, but again no difference.
Steps to reproduce the behaviour
import datacube
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE')
print(data)
<xarray.Dataset>
Dimensions: (time: 1, latitude: 2001, longitude: 4333)
Coordinates:
* time (time) datetime64[ns] 2022-12-16T10:09:22
* latitude (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
* longitude (longitude) float64 4.003 4.006 4.01 4.013 ... 16.99 17.0 17.0
spatial_ref int32 4326
Data variables:
band_0 (time, latitude, longitude) float64 0.0 0.0 0.0 ... nan nan 0.0
Attributes:
crs: EPSG:4326
grid_mapping: spatial_ref
data.geobox.extent.boundingbox
BoundingBox(left=4.002, bottom=42.999, right=17.001, top=49.002)
Environment information
Datacube version = 1.8.9 Python version = 3.9.13
I feel like I've seen something like this before, but can't remember the details. I suspect there's a mismatch in assumptions about cell coordinates (edge vs centre) somewhere.
Indeed, the offset in the attached image looks like it is off by half-a-pixel rather than a whole pixel.
@clausmichele it's most likely doing the right thing, but documentation on the topic is lacking, please read through this issue first:
https://github.com/opendatacube/odc-stac/issues/95
I know that the coordinates of the data I'm seeing are the pixel centres and not the boundaries, and I don't care if there's a padding but the problem is that the data is not aligned with the original one. Could you please try with the data I provided? I will change the title of this issue to reflect the problem.
but that's the point, it won't be aligned with the original data. It's simply not possible in the general case when there are more than one dataset, or more than one band even...
To use exactly the grid you need you have to use like=GeoBox(...)
. "Native load" is not something dc.load
supports even in situations where it should be possible (1 band of 1 dataset, for example). Trying to use align=(...)
parameter is just an exercise in frustration and confusion, best option is to construct exact GeoBox
you want and then pass it in like=
parameter.
also you should remove "storage" section from you product spec.
Thanks @Kirill888, using like
+ GeoBox
solved the issue for me! Anyway, I didn't find some good docs about this, I guess it's a recurrent issue many will encounter.
import datacube
from datacube.utils.geometry import intersects, GeoBox
import affine
ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
data = dc.load(product='SAMPLE_DATACUBE',like=ref_geobox)
print(data)
<xarray.Dataset>
Dimensions: (time: 1, latitude: 2000, longitude: 4332)
Coordinates:
* time (time) datetime64[ns] 2022-12-16T10:09:22
* latitude (latitude) float64 49.0 49.0 48.99 48.99 ... 43.01 43.0 43.0
* longitude (longitude) float64 4.005 4.008 4.011 ... 16.99 16.99 17.0
spatial_ref int32 4326
Data variables:
band_0 (time, latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
crs: EPSG:4326
grid_mapping: spatial_ref
For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general? Because I didn't see deprecation warnings when using it and it's also still in the docs https://datacube-core.readthedocs.io/en/latest/installation/product-definitions.html?highlight=product
For storage, why should I remove it? Just in this case since I'll use like=GeoBox or in general?
@clausmichele In general, this definition is incomplete, it's missing tile_size
, so it would have zero impact. See code snippet below, storage
section maps to .grid_spec
, but since it's incomplete it's as if it's not even there
https://github.com/opendatacube/datacube-core/blob/7f587ef075e30c15bb81e14c6dfa14d3080d337a/datacube/model/init.py#L516-L521
storage
section was originally there for ingested products, i.e. products that are regularly tiled. It's not enough to have uniform resolution and projection across datasets, there is also an expectation for each dataset to have the same size in pixels and for any 2 datasets to overlap either fully or not at all.
You want to use "load hints" instead to supply default projection and resolution to use during dc.load
from the product
https://datacube-core.readthedocs.io/en/latest/about-core-concepts/products.html#product-definition-api
errors in docs
@SpacemanPaul @pindge docs for product storage
section missing essential tile_size
parameter, and also have indentation issues in yaml:
https://github.com/opendatacube/datacube-core/blob/develop/docs/installation/product-definitions.rst
@Kirill888 I did some tests using load instead of storage, but I can't understand how it is supposed to work, since the values does not seem to follow a logic:
So, starting point, using like + GeoBox gives the right coordinates:
from datacube.utils.geometry import GeoBox
import affine
import rioxarray
ref_geobox = GeoBox(4332,2000, affine.Affine(0.0030000000000001137, 0.0, 4.003, 0.0, -0.0030000000000001137, 49.0), 'EPSG:4326')
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,like=ref_geobox,**query)
print(data.latitude.values)
print(data.longitude.values)
[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045 4.0075 4.0105 ... 16.9915 16.9945 16.9975]
Then, here a the results using different align parameters:
import datacube
# load:
# crs: EPSG:4326
# resolution:
# longitude: 0.003
# latitude: -0.003
# align:
# longitude: 0.0
# latitude: 0.0
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)
[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035 4.0065 4.0095 ... 16.9935 16.9965 16.9995]
-> wrong
import datacube
# load:
# crs: EPSG:4326
# resolution:
# longitude: 0.003
# latitude: -0.003
# align:
# longitude: 0.0015
# latitude: 0.0015
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)
[48.999 48.996 48.993 ... 43.005 43.002 42.999]
[ 4.002 4.005 4.008 ... 16.992 16.995 16.998]
-> wrong
import datacube
# load:
# crs: EPSG:4326
# resolution:
# longitude: 0.003
# latitude: -0.003
# align:
# longitude: 0.003
# latitude: 0.003
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)
[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035 4.0065 4.0095 ... 16.9935 16.9965 16.9995]
-> wrong
import datacube
# load:
# crs: EPSG:4326
# resolution:
# longitude: 0.003
# latitude: -0.003
# align:
# longitude: 0.001
# latitude: 0.001
dc = datacube.Datacube()
query = {}
query['product'] = 'SAMPLE_DATACUBE'
datasets = dc.find_datasets(**query)
data = dc.load(datasets=datasets,**query)
print(data.latitude.values)
print(data.longitude.values)
[48.9985 48.9955 48.9925 ... 43.0075 43.0045 43.0015]
[ 4.0045 4.0075 4.0105 ... 16.9915 16.9945 16.9975]
-> correct!
But before finding this values, I also tried 0.002, 0.0005 and many more. So what's the logic behind? The right align values should be 0.0015 in my opinion, looking at the docs (half the resolution, which here is 0.003)
align.{x,y} (optional)
By default the pixel grid is aligned such that pixel boundaries fall on x,y axis. This option allows us to translate the pixel grid. For example, to ensure that the pixel center of a 30m pixel grid is coincident with 0,0 use align:{x:15,y:15}.
This issue shouldn't be closed yet. It is not yet clear why the load + align parameters in the product definition give the showed outputs.
a val in coord list = align + (n + 0.5) * resolution
(where n is an integer.)
The +0.5 comes from converting from grid-edge coordinates to pixel-centre coordinates.
E.g. for:
# load:
# crs: EPSG:4326
# resolution:
# longitude: 0.003
# latitude: -0.003
# align:
# longitude: 0.0
# latitude: 0.0
[49.0005 48.9975 48.9945 ... 43.0065 43.0035 43.0005]
[ 4.0035 4.0065 4.0095 ... 16.9935 16.9965 16.9995]
49.0005 = 0.0 + (n+0.5) * 0.003 (n=16333, an integer)
I think all the numbers you see as "wrong" work out with that.
@clausmichele for a coordinate of a pixel EDGE x
, resolution r
and alignment argument a
, we have the following
-
0 <= a < |r|
, where|..|
is absolute value operator -
x == (N*r + a)
, whereN
is an integer
So every pixel spans from (N*r + a)
to ((N+1)*r + a)
with the center being N*r + a + r/2
. So if x' = ds.x[0]
(left most coordinate recorded in xarray), you should expect (x' - r/2 - a)/r
to be pretty close to some integer value. Obviously floating point issues are likely when working with resolution like 0.003
.
x == (N*r + a), where N is an integer
So, if a = 0
and I know x
, which is the pixel edge and I can find N
.
In my case, x = 49.000
, r = -0.003
-> N = x/r = 49/-0.003 = -16333.(3)
, we round the result to get an integer and we get N = -16333
.
Now that I know N
, if I want my pixel edge to be at 49.00
:
(N*r + a) = 49.00
-> a = 49.00 - N*r
-> a = 49.00 - (-16333*-0.003)
-> a = 0.001
Maybe all of this should be documented better in the ODC docs? This rounding to integer (when computing N
) is what creates the unexpected pixel center. Probably a good example could avoid other people having the same doubts that I solved thanks to you!
Edit: is the rounding to get N
always down to the closest integer?
There's a lot that could be better documented in the ODC docs. Documentation PRs are most welcome!
Re: load v storage in product metadata.
Storage should be deprecated - use load. The exact behaviour of storage at the moment is complicated, but it almost definitely doesn't do what you want. Yes, the documentation is wildly incorrect.