pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

support STAC specification

Open tomkralidis opened this issue 6 years ago • 37 comments

Implement the STAC API specification to support search/discovery of geospatial assets. Notes for implementation based on initial discussions with @matthewhanson :

  • add front end routes (/stac/search)
  • update pygeoapi.api to address request handling
  • implement STAC provider backend via plugin mechanism which will interrogate backend and return results as a Python dictionary for marshalling to JSON proper to the client
  • add stubs for transactional capability
  • populating a STAC backend could be via workflow beyond pygeoapi (i.e. implement a CLI within the backend, which is not hooked into pygeoapi tooling proper but can be run offline just the same)

tomkralidis avatar Sep 02 '19 20:09 tomkralidis

I'm curious to see how stac, ogcapi-coverage and ogcapi-records operate together on a single endpoint, what aspects they can share and where are the challenges, would be good input for the upcoming sprint

pvgenuchten avatar Sep 09 '19 07:09 pvgenuchten

Implement the STAC API specification to support search/discovery of geospatial assets. Notes for implementation based on initial discussions with @matthewhanson :

  • add front end routes (/stac/search)
  • update pygeoapi.api to address request handling
  • implement STAC provider backend via plugin mechanism which will interrogate backend and return results as a Python dictionary for marshalling to JSON proper to the client

Are you considering external libraries for marshalling? Or we need to implement our own?

  • add stubs for transactional capability
  • populating a STAC backend could be via workflow beyond pygeoapi (i.e. implement a CLI within the backend, which is not hooked into pygeoapi tooling proper but can be run offline just the same)

francbartoli avatar Oct 30 '19 18:10 francbartoli

Are you considering external libraries for marshalling? Or we need to implement our own?

Good point. I'm guessing a STAC backend could provided via one of the sat-utils tools (for example) and a STAC backend's mission would be to provide Python dict's of JSON objects back to pygeoapi.api, but this remains to be seen/needs to be further tested.

tomkralidis avatar Oct 30 '19 19:10 tomkralidis

Are you considering external libraries for marshalling? Or we need to implement our own?

Good point. I'm guessing a STAC backend could provided via one of the sat-utils tools (for example) and a STAC backend's mission would be to provide Python dict's of JSON objects back to pygeoapi.api, but this remains to be seen/needs to be further tested.

Ok that makes sense, thanks @tomkralidis.

francbartoli avatar Oct 30 '19 19:10 francbartoli

@tomkralidis Should stac/search endpoint be optional in the configuration? I would say yes...

francbartoli avatar Nov 05 '19 19:11 francbartoli

Would this depend on how we describe in configuration? Like, is STAC a dataset in config? Other options?

tomkralidis avatar Nov 05 '19 19:11 tomkralidis

Any guidance from the stac team? Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

pvgenuchten avatar Nov 05 '19 21:11 pvgenuchten

@pvgenuchten I would consider its own endpoint as suggested in the bullet above from @tomkralidis (cc @matthewhanson)

francbartoli avatar Nov 05 '19 21:11 francbartoli

Configuration could be something like:

catalogs:
    sat-api:
        provider:
            name: STAC
            data: https://sat-api-dev.developmentseed.org/stac

francbartoli avatar Nov 05 '19 22:11 francbartoli

@francbartoli is the thought that STAC catalog providers would be their own provider architecture (i.e. separate from dataproviders), or that STAC would be a quality of existing data providers? If an elasticsearch backend, for instance, was loaded with STAC Items (perhaps marked in the dataset configuration), then some STAC-specific capabilities could be enabled.

To comment on the above comment:

Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

My understanding (which is a bit weaker, since I mostly work with static STACs) is that STAC API contains some additional endpoints:

  • /stac - Simply gets the root catalog.
  • /stac/search - Implemented so that STAC can do more advanced queries via extensions than what OAF currently supports

The idea would be that eventually, with the convergence of the Query/Filter extensions into OAF, the second endpoint would go away.

@matthewhanson could provide more info as I'm basically summarizing what I heard from him yesterday at the STAC sprint.

lossyrob avatar Nov 06 '19 14:11 lossyrob

There is currently a PR up to change those endpoints: https://github.com/radiantearth/stac-spec/pull/632

The /stac endpoint would go away because it's redundant with the root endpoint / - it just returns a STAC catalog, which is the same thing that the root OAF endpoint returns with some additional fields.

/stac/search endpoint is proposed to be be renamed to /items and proposed to OAF as a general cross-collection search endpoint. However, this wouldn't go in until OAF 1.1.

matthewhanson avatar Nov 06 '19 14:11 matthewhanson

Thanks @matthewhanson, so in the meantime, we could adopt /items but for users that might be a bit confusing to understand if it is not part yet of the OAF spec. And we don't know when it will land there

francbartoli avatar Nov 06 '19 16:11 francbartoli

Right, not sure when it will land, but now it's agreed it's going to be /search not /items

matthewhanson avatar Nov 06 '19 16:11 matthewhanson

@francbartoli is the thought that STAC catalog providers would be their own provider architecture (i.e. separate from dataproviders), or that STAC would be a quality of existing data providers? If an elasticsearch backend, for instance, was loaded with STAC Items (perhaps marked in the dataset configuration), then some STAC-specific capabilities could be enabled.

@lossyrob do you mean something like this below (looking at earth-search)?

datasets:
    cbers4-awfi:
        title: CBERS 4 AWFI Imagery
        description: CBERS 4 AWFI Imagery
        keywords:
            - stac
            - stac-api
            - assets
        links:
                -   type: application/json
                    rel: collection
                    title: information
                    href: https://earth-search.aws.element84.com/collections/cbers4-awfi
                    hreflang: en-US
        extents:
            spatial:
                bbox: [-180,-90,180,90]
                crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
            temporal:
                begin: null
                end: null  # or empty (either means open ended)
        provider:
            name: STAC
            data: # borrow data architecture from OGR provider
                source_type: ES
                source: ES:http://localhost:9200/cbers4-awfi

we are then implicitly saying that cbers4-awfi is a collection but at some point losing the knowledge of being a specific stac one. I mean at least from an OAPIF perspective.

On the other hand, we could have a dedicated architecture like:

catalogues:
    hello-catalogue:
        type: OAPIC (CAT4)???

    sat-api:
        type: STAC
        provider:
            name: STAC
            datasets:
                cbers4-awfi:
                    title: CBERS 4 AWFI Imagery
                    description: CBERS 4 AWFI Imagery
                    keywords:
                        - stac
                        - stac-api
                        - assets
                    links:
                        -   type: application/json
                            rel: collection
                            title: information
                            href: https://earth-search.aws.element84.com/collections/cbers4-awfi
                            hreflang: en-US
                    extents:
                        spatial:
                            bbox: [-180,-90,180,90]
                            crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
                        temporal:
                            begin: null
                            end: null  # or empty (either means open ended)
                    provider:
                        name: STAC
                        data: # borrow data architecture from OGR provider
                            source_type: ES
                            source: ES:http://localhost:9200/cbers4-awfi

Here the concept of collection is nested in the specific provider type. Other options @tomkralidis @pvgenuchten ?

To comment on the above comment:

Is stac intended to run along side ogc api’s in a single ogc-api endpoint, or does it require it’s own endpoint, in that case maybe deploy a second instance of pygeoapi in a ‘stac’ modus?

My understanding (which is a bit weaker, since I mostly work with static STACs) is that STAC API contains some additional endpoints:

  • /stac - Simply gets the root catalog.
  • /stac/search - Implemented so that STAC can do more advanced queries via extensions than what OAF currently supports

The idea would be that eventually, with the convergence of the Query/Filter extensions into OAF, the second endpoint would go away.

@matthewhanson could provide more info as I'm basically summarizing what I heard from him yesterday at the STAC sprint.

francbartoli avatar Nov 06 '19 18:11 francbartoli

Perhaps /search as the cross collection search reuses the provider plugin approach and is specified like:

catalogues:
    landsat8-aws:
        type: STAC
        title: Landsat 8 AWS catalog
        description: Landsat 8 AWS catalog
        keywords:
            - landsat
        links:
            - type: text/html
              rel: canonical
              title: information
              href: https://registry.opendata.aws/landsat-8/
              hreflang: en-US
        extents:
            spatial:
                bbox: [-180,-90,180,90]
                crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
            temporal:
                begin: 2013-03-18
                end: null  # or empty (either means open ended)
        provider:
            name: Elasticsearch
            data: http://localhost:9200/landsat-aws/FeatureCollection
            id_field: ID

and then /search can is routed to reuse pygeoapi.get_collection_items. In the /search case, collections is a query parameter. So we can either consider searching every endpoint in catalogues in the config, or having a single catalogue with a required collection property that can be queried against. The former would be tricky as to how to return multi-collection results in a single FeatureCollection.

Thoughts?

tomkralidis avatar Nov 06 '19 18:11 tomkralidis

Considering https://github.com/radiantearth/stac-spec/pull/632#issuecomment-550350731, i imagine this method will search/browse through a server in a google type of way: a list of 3 datasets, 5 catalogrecords and 2 grids. I like it. From the current discussion I get the feeling that the stac team actually wants to see stac being made available embedded in a/the OAPI endpoint (and not separately).

Sorry of my unawareness about stac, am i getting it correctly that stac exposes a queryable series of metadata records of sensor observations (imagery) at a given time/location? A client will then be able to extract the relevant fraction of a cloud-optimised-geotiff (or alternative source)? To me these cases seem quite similar to what others are designing in OAPI-records, sensorthings and/or OAPI-coverage, so either very likely to collide (separate endpoint +1) or on the other hand this could be an opportunity to engage with those teams and design a shared model (embedded +1)

looking forward to hear your thoughts/ideas

pvgenuchten avatar Nov 06 '19 22:11 pvgenuchten

WIP in https://github.com/geopython/pygeoapi/tree/stac . Notes:

  • code basically re-uses /collections/items logic along with a filter JSON payload (currently does nothing), and detects /search in order to query catalogues objects / backends in config
  • the concept of a default or cross collection search still to be determined. Specifying collections works, albeit against a single collection atm. If we have 1..n catalogues objects defined in pygeoapi, how would a cross collection search work? If we assume, for example, that all catalogues are backed by something like ES, then one can do cross index searching. Else, we could define a single catalogue in a pygeoapi instance in which all documents to be searched are in that single index, which would work, but not very pragmatic

Note the STAC example here is based on Landsat 8 AWS (tooling hacked together at https://gist.github.com/tomkralidis/3b6263ec9fbd84e6b50d79527dda149f to setup a basic ES index.

tomkralidis avatar Nov 07 '19 02:11 tomkralidis

In geonetwork we deploy a specific instance of elastic search for this use case; metadata records, as well as content from WFS's is indexed in that instance to facilitate cross CSW/WFS search. An administrator indicates which WFS's to crawl.

This approach could also be a relevant for pygeoapi. In the case of csv/shapefiles pygeoapi could operate against the index for many operations, which would benefit performance.

If an index like elastic would become such an essential component, it would be good to facilitate an abstraction layer, so a user could select his favourite index (or database) to provide such functionality (SOLR, Noise, PostGIS)

pvgenuchten avatar Nov 07 '19 07:11 pvgenuchten

@francbartoli I'm a bit unclear what the best path is on the configuration side, but I think that's due to my lack of familiarity of pygeoapi. Tom's WIP branch looks like it's on the right track though!

lossyrob avatar Nov 07 '19 18:11 lossyrob

Update: current work in https://github.com/geopython/pygeoapi/tree/stac

tomkralidis avatar Jan 13 '20 13:01 tomkralidis

FYI functionality merged in #389. Keeping open for STAC API implementation.

tomkralidis avatar Apr 12 '20 22:04 tomkralidis

@tomkralidis Any news on the implementation of the /search endpoint?

the stac branch doesn't seem to exist anymore, but apparently there was some WIP toward adding this functionality.

ricardogsilva avatar Oct 07 '20 23:10 ricardogsilva

@ricardogsilva in the stac branch there was a basic Elasticsearch provider which became dated. With OGC API - Records evolving, we decided to wait on implementing STAC API until it becomes more clear on how OARec will relate to stac /search.

tomkralidis avatar Oct 08 '20 00:10 tomkralidis

Hi All,

I hate to dig up an old post -- but has any /search feature been added (i.e. like https://stacspec.org/STAC-api.html#operation/getSearchSTAC)? We have just setup pygeoapi, and it seems to still not be available.

Thanks!

gnosys-tmiller avatar May 09 '22 21:05 gnosys-tmiller

@gnosys-tmiller I have a pending branch/PR to implement STAC API, which should be completed in the next 2 weeks or so. cc @cholmes.

tomkralidis avatar May 11 '22 01:05 tomkralidis

Hey @tomkralidis is your WIP allowing an existing STAC API to be browsed from with pygeoapi, or for pygeoapi itself to act as a STAC API?

Also, this is labeled "help wanted" - what can be done to help? :wink:

bkanuka avatar May 12 '22 14:05 bkanuka

I to am interested in the intersection of STAC and pygeoapi - any links to what support can be offered? Happy to dig in and help.

jlaura avatar Oct 11 '22 18:10 jlaura

Any progress here @tomkralidis -- can we lend a hand getting this over the finish line?

dblodgett-usgs avatar Nov 16 '23 14:11 dblodgett-usgs

As per RFC4, this Issue has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Mar 10 '24 21:03 github-actions[bot]

As per RFC4, this Issue has been closed due to there being no activity for more than 90 days.

github-actions[bot] avatar Mar 24 '24 03:03 github-actions[bot]