stac-spec icon indicating copy to clipboard operation
stac-spec copied to clipboard

Extending `providers` values for remote/decentralized data

Open fmigneault opened this issue 1 month ago • 3 comments

In the providers roles: https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md#roles

https://github.com/radiantearth/stac-spec/blob/ec002bb93dbfa47976822def8f11b2861775b662/item-spec/json-schema/provider.json#L25-L37

Available roles allow representing self-hosting of derived products. However, given that STAC can point to assets hosted anywhere (including decentralized nodes), a STAC catalog/collection/items that simply indexes existing data from alternate source locations does have a clear way to indicate its own role.

For example, we intend to provide "augmented" STAC metadata of parsed non-STAC data sources to facilitate search and retrieval by users from common location, but we are neither going to duplicate the data (therefore not host or producer of the data), nor modify it (therefore not processor), and are not owner (therefore not licensor). A network of decentralized nodes could all refer to a common data source, while neither of them hosts or duplicates the reference data to ensure that they share the same unique "source of truth".

Could there be an addition to the Enum to include indexer, or other similar roles for remote server references?

Current workarounds include using rel: alternate links and https://github.com/stac-extensions/contacts, but those are not as explicit about the "data provisioning" aspect, and is mixed with other non-data-specific contact details.

fmigneault avatar Oct 31 '25 16:10 fmigneault

This proposal makes a lot of sense to me. I think you would want to update the descriptive text for Providers in the stac-spec since it really emphasizes processing or altering the data.

jsignell avatar Nov 07 '25 21:11 jsignell

In STAC Index we don't actually alter the metadata. People will find it through the indexer any way so I didn't see a point in adding it specifically as additional provider. If I'd go for it, I'd honestly just use "host". If you index it, you host the metadata. I don't think host must specifically be about the assets only.

If we add a new role to the enum, is that a breaking change that needs 2.0? A client not expecting it could break, right?

m-mohr avatar Nov 12 '25 16:11 m-mohr

The current "host" definition does not allow using it for that purpose:

host: The host is the actual provider offering the data on their storage. There should be no more than one host, specified as the last element of the provider list.

Because:

  1. The server is not the actual data storage location
  2. Since the actual host storage should still be indicated, another host (or multiple ones in a decentralized network) would be invalid and confusing

fmigneault avatar Nov 12 '25 20:11 fmigneault