stac-spec icon indicating copy to clipboard operation
stac-spec copied to clipboard

Sparse Data in STAC

Open matthias-mueller opened this issue 3 years ago • 2 comments

What would be a good approach to handle sparse data with STAC? There are two immediate cases:

  1. A geographic data set that consists of a few data locations/regions that are very far apart (large BBox but only a few small geometries)
  2. A spatio-temporal dataset that has large temporal gaps, e.g. a geo-located time series that runs over thirty years but only has a few weeks of data per year

(And of course there are combinations of these two cases)

For the general search case, a STAC client probably wants to exclude assets from the search results that topologically cover the search AOI (in terms of BBox and TimeRange) but due to gaps actually do not have any data within in the search AOI.

I haven't found any structures in STAC that efficiently support queries on sparse data, so I assume it is not explicitly supported at the moment. But is it something that is on STACs agenda or has been discussed in its making?

Side note 1: For a very specific case this issue might relate to #803 because a search client could be more interested which parts of the scene are cloud-free.

Side note 2: DynamicCatalogs have been suggested for Landsat archives, which could communicate gaps within item collections. But there are limits to this approach if the gaps occur in the leaf items.

matthias-mueller avatar Jun 29 '21 11:06 matthias-mueller

Some thoughts:

  • You could use a STAC collection, which allows to specify multiple bboxes and temporal extents to be defined and then link to the individual items, which each cover a certain/specific part of your data set. Would that work?
  • Queries are not supported in static STAC, you can check the API spec for advanced queries using CQL.

m-mohr avatar Jul 02 '21 22:07 m-mohr

Thanks for the suggestions @m-mohr - I guess I was originally looking for an option to achieve this kind of filtering with static STAC.

Some comments on your reply:

  • Option (1) has the implication that it requires providers to break the original datasets into pieces - I am not sure that this is desirable or sometimes even feasible from a provider's perspective. - At least it would not work in my projects for the purpose of cataloging.

  • For option (2) I was not aware of the integration of STAC and CQL. I've seen that STAC standardizes spatial and temporal (extent) properties. When you switch over to CQL - how do you get information on the name of the temporal field and can you be sure it is encoded as standard ISO date time?

matthias-mueller avatar Jul 04 '21 19:07 matthias-mueller

Closing this issue due to inactivity. @matthias-mueller please reach out if you still have questions.

PowerChell avatar Jul 11 '23 15:07 PowerChell