datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Single product matcher error fails if dataset document contains `datetime` content

Open whatnick opened this issue 4 months ago • 0 comments

Expected behaviour

If a single product matcher fails it correctly shows errors generated.

Actual behaviour

Due to Python builtin JSON serialize usage non-string internal representation of datetime fails to serialize producing errors such as one below.

  File "/env/lib/python3.10/site-packages/odc/apps/dc_tools/utils.py", line 223, in index_update_dataset
    ds, err = doc2ds(metadata, uri)
  File "/env/lib/python3.10/site-packages/datacube/index/hl.py", line 323, in __call__
    dataset, err = self._ds_resolve(doc, uri)
  File "/env/lib/python3.10/site-packages/datacube/index/hl.py", line 240, in resolve
    return remap_lineage_doc(main_ds, resolve_ds, cache={}), None
  File "/env/lib/python3.10/site-packages/datacube/model/utils.py", line 354, in remap_lineage_doc
    return visit(root)
  File "/env/lib/python3.10/site-packages/datacube/model/utils.py", line 346, in visit
    return mk_node(ds,
  File "/env/lib/python3.10/site-packages/datacube/index/hl.py", line 235, in resolve_ds
    product = match_product(doc)
  File "/env/lib/python3.10/site-packages/datacube/index/hl.py", line 73, in match
    % (json.dumps(doc, indent=4),
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

Steps to reproduce the behaviour

Create a product document with name say ls8_derived . Attempt to index into this product a dataset document with product name say derived . The error code is triggered. However if the dataset document contains datetime data such as

properties:
  datetime: 2024-04-18 06:18:48.308360Z

The error will fail to serialize. Since the YAML reader is datetime aware and creates a datetime object which the JSON serializer cannot convert back to string since Python's built in JSON serializer cannot serialize datetime objects.

This sort of roundtripping error is demonstrated below:

>>> import yaml, json
>>> data = yaml.full_load(open('20130327.yaml'))
>>> json.dumps(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

Environment information

  • Which datacube --version are you using? - Open Data Cube core, version 1.8.18
  • What datacube deployment/enviornment are you running against? - CSIRO AquaWatch Data Services

Note: Stale issues will be automatically closed after a period of six months with no activity. To ensure critical issues are not closed, tag them with the Github pinned tag. If you are a community member and not a maintainer please escalate this issue to maintainers via GIS StackExchange or Slack.

whatnick avatar Apr 19 '24 00:04 whatnick