datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Ingest collapses time range to a single point

Open Kirill888 opened this issue 7 years ago • 3 comments

Expected behaviour

dc.Dataset has a property .time which is a time range covering the capture period from earliest pixel to the latest. Ingestion process generates one or more datasets containing parts or whole of the original dataset data reprojected according to the GridSpec.

I expect the .time property of the ingested datasets to be the same as input dataset.

Actual behaviour

Time range of the ingested dataset is a single point, as in ds.time[0] == ds.time[1] and is set to the mid-point of the original dataset time interval.

Steps to reproduce the behaviour

On NCI you can check that ingested datasets have a single point time range, even though they were ingested from data with a non-point time interval

Running this on NCI:

import datacube
dc = datacube.Datacube()
ds = dc.index.datasets.get('e999002e-71c6-46ee-9032-ad94478926e9', include_sources=True)
print('ingested:', ds.time)
print('original:', ds.sources['0'].time)

produces:

ingested: Range(begin=datetime.datetime(2018, 2, 1, 0, 7, 7), end=datetime.datetime(2018, 2, 1, 0, 7, 7))
original: Range(begin=datetime.datetime(2018, 2, 1, 0, 6, 51), end=datetime.datetime(2018, 2, 1, 0, 7, 23))

Where it's broken

Ingestor is using this function to create a new dataset object

https://github.com/opendatacube/datacube-core/blob/b6ca35143778aa5157d10247fe1645c0f9532961/datacube/model/utils.py#L176-L190

Notice how the only way to supply time information is via center_time parameter; internally it's copied into from_dt,to_dt, center_dt properties of the extent subtree of the metadata document.

Instead this should take time_range, copied from the parent datasource, maybe with an optional convenience parameter center_time when time range is a single point in time.

Kirill888 avatar Apr 17 '18 06:04 Kirill888

Thanks for the awesome write up Kirill!

If we're going to store time as a range, we need to fix this. Ingestion shouldn't be throwing away data.

Can @jeremyh or anyone else remind me what the benefits are of storing time as a range. It increases complexity over storing time as a single value, so would be good to have documented justification.

omad avatar Apr 18 '18 23:04 omad

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 08 '20 07:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 16 '21 05:02 stale[bot]

Ingestion will be deprecated in Datacube v1.9 and removed in v2, this will not be fixed.

omad avatar May 18 '23 03:05 omad