datacube-core
datacube-core copied to clipboard
Ingest collapses time range to a single point
Expected behaviour
dc.Dataset has a property .time which is a time range covering the capture period from earliest pixel to the latest. Ingestion process generates one or more datasets containing parts or whole of the original dataset data reprojected according to the GridSpec.
I expect the .time property of the ingested datasets to be the same as input dataset.
Actual behaviour
Time range of the ingested dataset is a single point, as in ds.time[0] == ds.time[1] and is set to the mid-point of the original dataset time interval.
Steps to reproduce the behaviour
On NCI you can check that ingested datasets have a single point time range, even though they were ingested from data with a non-point time interval
Running this on NCI:
import datacube
dc = datacube.Datacube()
ds = dc.index.datasets.get('e999002e-71c6-46ee-9032-ad94478926e9', include_sources=True)
print('ingested:', ds.time)
print('original:', ds.sources['0'].time)
produces:
ingested: Range(begin=datetime.datetime(2018, 2, 1, 0, 7, 7), end=datetime.datetime(2018, 2, 1, 0, 7, 7))
original: Range(begin=datetime.datetime(2018, 2, 1, 0, 6, 51), end=datetime.datetime(2018, 2, 1, 0, 7, 23))
Where it's broken
Ingestor is using this function to create a new dataset object
https://github.com/opendatacube/datacube-core/blob/b6ca35143778aa5157d10247fe1645c0f9532961/datacube/model/utils.py#L176-L190
Notice how the only way to supply time information is via center_time parameter; internally it's copied into from_dt,to_dt, center_dt properties of the extent subtree of the metadata document.
Instead this should take time_range, copied from the parent datasource, maybe with an optional convenience parameter center_time when time range is a single point in time.
Thanks for the awesome write up Kirill!
If we're going to store time as a range, we need to fix this. Ingestion shouldn't be throwing away data.
Can @jeremyh or anyone else remind me what the benefits are of storing time as a range. It increases complexity over storing time as a single value, so would be good to have documented justification.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Ingestion will be deprecated in Datacube v1.9 and removed in v2, this will not be fixed.