stac-spec icon indicating copy to clipboard operation
stac-spec copied to clipboard

Vector data support

Open McSurf84 opened this issue 3 years ago • 6 comments

Hi everybody, I have a topic to discuss in the next meeting and want to share it before to think about it. Currently STAC is associated with raster data and also the FAQs indicate that vector data should be used directly with WFS. But what speaks against the administration of vector data in STAC? And what needs to be done to enable the administration of all possible spatial data with STAC to become the single source of (data) truth? I look forward to your opinions 👐

McSurf84 avatar Jul 30 '21 09:07 McSurf84

Welcome @McSurf84!

So I wasn't able to find the perfect document describing how this 'should' work, so I'll try a brief write up here, and aim to get to something that better explains this.

So the issue is really that putting vector data on an 'item' is the wrong abstraction level. Each row of a vector data corresponds more with an 'item', and the vector dataset itself belongs at the 'collection' level. I've been working on fleshing the more 'generic' OGC collection that is the bit we'd need to fit vector into the 'stac world'. But STAC is really more about the search within a collection.

I explain this some more at https://github.com/cholmes/ogc-collection#stac-items-and-records and that repo has the thinking on it.

So what we want to get to is a definition of a 'dataset collection', which will include a collection of STAC Items (like 'landsat') as well as vector datasets (like 'tiger roads' in the US). And then a STAC API would be able to offer collection level search, implementing the OGC Records API. But in terms of core constructs I want to define the 'dataset collection' at a level 'above' STAC, and keep STAC focused on cataloging of 'assets'.

We have been making good progress with the OGC API Records group, and are now quite aligned on this path. They're working on refactoring their spec to help make the relationship clear. See https://github.com/opengeospatial/ogcapi-records/pull/129

cholmes avatar Jul 30 '21 15:07 cholmes

@McSurf84 and others: We'll need some vector support as well in the future so I'm wondering what your requirements are and what is missing for you in the spec? Because simply linking to a let's say GeoPackage file is not an issue at all right now. So making it work for vector files in general doesn't seem to be an issue; it is probably just what some additional metadata lacking or how to structure it?

What we'd probably have in our case (openEO) is collections of e.g. "points of trees in Germany" or "admin boundaries" (that a user can load into the processing pipeline) or some vector data that we export from a processing pipeline (e.g. "zonal statistics for forests in Australia").

m-mohr avatar Oct 20 '21 15:10 m-mohr

The original question was about the fact that the FAQ does not recommend to store vector data in STAC. We had put this up for discussion in a STAC meeting and various ideas came up there about how something like this could perhaps be done. Since then, however, this has been neglected by us as well.

We are in the process of creating a new spatial data infrastructure for our state agency and would like to use STAC to compile an index of the available data. Of course, this also concerns vector data. Here it would be a good idea to group the geodata and not have every single feature reflect an item in STAC. To take up your idea of the points in trees, this could become otherwise very fast confusing. A grouping by parcels/urban areas or whatever would definitely be a possibility here.

The topic will get more attention in the near future, but we would also like to go into an exchange to share experiences and possibly also to have consistency with other users / STAC creators.

McSurf84 avatar Oct 25 '21 14:10 McSurf84

I don't know if it's relevant to this issue, but I'll share what we're doing on the Planetary Computer.

I've come at this from the point of view of tabular data, where some of the columns happen to contain (vector) geometries. From that point of view, STAC has been a natural fit:

In our case, the actual assets are links to parquet datasets (technically, the root of the parquet dataset in blob storage, which might be partitioned into many files). You could imagine modeling a spatially partitioned parquet dataset as a collection of items (each with their own geometry), but we haven't explored that yet.

The example notebook for US census gives a good overview of how STAC is used for data access.

We're relying on the table extension to catalog the columns available in each item: https://github.com/stac-extensions/table

TomAugspurger avatar Oct 25 '21 14:10 TomAugspurger

Quick recap from a recent telco with LGLN:

  • If you don't have large chunks of vector data in a single file or so, it doesn't make a lot of sense to use static STAC catalogs.
  • Most logical seems to be that you already have or go for a database solution and a STAC API that can query the vector data through Item search.
  • The STAC API should implement OGC API - Features. Depending on how the data is or should be available the vector data can either be in the returned GeoJSON directly or be exposed as an asset.
    • If it's in the GeoJSON directly, STAC would still encourage users to link to itself so that tooling can download the vector data
    • Still, the GeoJSON returned for OGC API - Features itself can replicate as much as possible of the vector data.
  • Item search can easily be used to search for the vector data. This can be mixed with raster data.
  • We should introduce extensions for this.
    • At least a vector extension that has a field such as vector: true or type: ['vector'] should be included so that you can include or exclude vector data from search. (By the way @TomAugspurger, this could also be useful for the table extension and for general raster data, too).
    • Additional extensions might be required for different types of vector data, for example, we discussed building footprints that have specific metadata assigned. Some of them can be generic for all STAC implementers, but others may need to be country or region-specific (e.g. for Lower-Saxony, i.e. LGLN specific).
  • STAC Collections can pretty much be used as normal and use the extensions described above.

Did I miss anything @McSurf84 ?

The new website should elaborate more on vector data. Currently, it says at https://stacspec.org/faq.html:

Q. I have vector data, should I use STAC? A. No. Vector data should be handled directly with WFS 3.

We found this to be pretty confusing (and outdated).

m-mohr avatar Nov 15 '21 09:11 m-mohr

I recently changed this to:

Yes! Vector data can in principle be handled with STAC, but it's not as well defined as for raster data. STAC it closely aligned with OGC API - Features though and you should have a look at that specification, too.

The previous explanation was wrong and confusing and was the most cited thing from the STAC website that I've seen in the past.

m-mohr avatar Apr 08 '22 16:04 m-mohr

A general nice guideline from @matthewhanson is:

  • If your geometry is the data, use OGC API- Features
  • If your geometry is the metadata, use STAC

Nevertheless, vector data (especially larger sets of geometries) is generally supported by STAC, e.g. as a geoparquet asset. The quesion is more how you expose/enrich the assets with metadata, for example with the table extension. That's a data format issue that probably needs separate issues per data format and/or per extension.

m-mohr avatar Jul 11 '23 15:07 m-mohr