stac-geoparquet icon indicating copy to clipboard operation
stac-geoparquet copied to clipboard

Missing assets are deserialized as `None`

Open gadomski opened this issue 1 year ago • 2 comments

If many items are converted to a table, and then converted back to dictionaries, any missing assets are converted to None (which is invalid STAC):

import pystac
import stac_geoparquet
from pystac import Item

item: Item = pystac.read_file(
    "https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/simple-item.json"
)
reduced_item = item.full_copy()
del reduced_item.assets["thumbnail"]

table = stac_geoparquet.arrow.parse_stac_items_to_arrow([item, reduced_item])
items = list(stac_geoparquet.arrow.stac_table_to_items(table))
assert items[1]["assets"][
    "thumbnail"
], f"the thumbnail asset is {items[1]['assets']['thumbnail']}"

Output:

Traceback (most recent call last):
  File "check.py", line 13, in <module>
    assert items[1]["assets"][
           ^^^^^^^^^^^^^^^^^^^
AssertionError: the thumbnail asset is None

gadomski avatar Aug 27 '24 21:08 gadomski

JSON is more descriptive than Arrow around null and undefined. Because Arrow is columnar, we essentially only preserve null and not undefined (because the column is defined).

I believe pyarrow serializes all Arrow null values as None by default. But I agree we shouldn't be able to construct invalid STAC items, so perhaps we should manually remove None from asset values? Anywhere else that None is invalid? Or should we be coercing None to undefined everywhere?

kylebarron avatar Aug 28 '24 13:08 kylebarron

perhaps we should manually remove None from asset values?

Yup, that's what I did here. I ran across the issue when doing some test translations of 1000 sentinel-2 items from the PC — some of the items had a missing preview asset, which is a sort-of-common thing to happen in real-world systems in my experience.

Anywhere else that None is invalid?

I think in most cases it's ok, and that assets is a bit of a special case.

gadomski avatar Aug 28 '24 13:08 gadomski

Going to close as completed by https://github.com/stac-utils/stac-geoparquet/pull/111 as well.

gadomski avatar Jun 17 '25 13:06 gadomski