stac-spec
stac-spec copied to clipboard
Collection summaries including Asset-only fields
The collection spec states:
Collections are strongly recommended to provide summaries of the values of fields that they can expect from the properties of STAC Items contained in this Collection.
One interpretation of this is that summaries should only include properties of STAC Items, which would exclude properties that might only exist on the Asset objects of the Item. However, it would be useful to include Asset-only properties in the summaries - for example, in the file extension, there is a file:values
property that contains useful information about the classification values of a raster (though this will likely move to another extension). Having a summary at the Collection level for this asset-only property would be useful to allow users to know the classmap without having to dig into an Item.
Should asset-only properties be allowed in Collection summaries? If so, should they be treated the same as Item properties - with the property existing as a top level property in summaries (e.g. "summaries": { "file:values": [ ... ] }
? What about if multiple assets have the same property value; should there be something which differentiates between the assets whose properties are being summarized?
I'd vote for treating Asset-only properties the same as Item properties; if there are multiple Assets that implement a property, then both could be summarized without differentiating, and the summary would remain valid IMO.
Summaries of asset properties (and possible link properties) would be useful for us in the Radiant MLHub API.
We have heard from a few users that it would be nice to have a collection-level summary of the file formats of the source imagery so that users could search for collections based on formats that fit into their existing workflows. We utilize the Label Extension and typically have separate source imagery and label collections. In the source imagery collection, we would probably want to summarize the Asset media type (type
property) of any data assets in the collection. One problem we might run into here is distinguishing between Item- and Asset-level properties of the same name (e.g. type
). There is obviously no reason to summarize the Item-level type
property (since it will always be "Feature"
), but we may still want a mechanism to make it clear in these kinds of cases.
What about if multiple assets have the same property value; should there be something which differentiates between the assets whose properties are being summarized?
Not sure if this is the same issue, but in our case we would only be summarizing the media type for "data" assets (i.e. not thumbnails and other assets), so it might be good to indicate this somehow in the summaries. I'm not sure what the best/clearest way is to do this, though.
This probably goes beyond the scope of what this issue is discussing, but ideally we would have a way of summarizing the Asset media type (type
property) associated with the assets listed in the label:assets
property of Links with "rel": "source"
as defined in the Label Extension "Links: source imagery" section. This is a bit tricky since it would requires getting the label:assets
property from links in a label collection and then summarizing the media type of the assets associated with the Items that those Links point to (which could be in a totally different collection).
Isn't this (partially?) what the Item Asset Definition Extension is about? https://github.com/stac-extensions/item-assets
One problem we might run into here is distinguishing between Item- and Asset-level properties of the same name (e.g. type). There is obviously no reason to summarize the Item-level type property (since it will always be "Feature"), but we may still want a mechanism to make it clear in these kinds of cases.
Yes I think this mechanism is necessary, for example the Common Metadata notes that created
and updated
can be used against both Items and Assets, and there may be a need to summarise both. Raised via https://github.com/radiantearth/stac-spec/discussions/1156.
We are thinking of progressing this in our custom stac extensions, but just wanted to check if we are doing something that would align to future stac core changes as much as possible. Can I get some feedback on this way of summarising asset metadata?
Asset created and updated summary could look like this in the collection.json.
"summaries": {
"assets": {
"created": {
"minimum": "1901-01-01T00:00:00Z",
"maximum": "1920-01-01T00:00:00Z"
},
"updated": {
"minimum": "1901-01-02T00:00:00Z",
"maximum": "1920-01-02T00:00:00Z"
}
}
}
And with some other item properties
it would look like this.
"summaries": {
"assets": {
"created": {
"minimum": "1901-01-01T00:00:00Z",
"maximum": "1920-01-01T00:00:00Z"
},
"updated": {
"minimum": "1901-01-02T00:00:00Z",
"maximum": "1920-01-02T00:00:00Z"
}
},
"platform": ["Fixed-wing Aircraft"],
"instruments": ["EAGLE IV"],
"created": {
"minimum": "1999-01-01T00:00:00Z",
"maximum": "2010-01-01T00:00:00Z"
},
"updated": {
"minimum": "1999-01-02T00:00:00Z",
"maximum": "2010-01-02T00:00:00Z"
}
},
So, I think the assets key with summaries one level below would "break" a lot of tooling that would expect JSON Schema in there. It would certainly not work well in STAC Browser at least and I think implementations would have a hard time differentiating between the new extension and what is allowed right now. I'd recommend putting the asset summaries into a new field. Or maybe it would better fit with the Item Asset definition extension? @matthewhanson In the extension you can only set a specific value and not summarize, but maybe it would be worth extending that instead of mangling with the summaries that were meant for item properties?
The item properties as you show them are already supported, that's no problem.
Thanks @m-mohr. How would this work as a new field? Happy to suggest it as a pull request on the Item Asset extension if that will help?
"asset_summaries": {
"created": {
"minimum": "1901-01-01T00:00:00Z",
"maximum": "1920-01-01T00:00:00Z"
},
"updated": {
"minimum": "1901-01-02T00:00:00Z",
"maximum": "1920-01-02T00:00:00Z"
}
},
"summaries": {
"platform": ["Fixed-wing Aircraft"],
"instruments": ["EAGLE IV"],
"created": {
"minimum": "1999-01-01T00:00:00Z",
"maximum": "2010-01-01T00:00:00Z"
},
"updated": {
"minimum": "1999-01-02T00:00:00Z",
"maximum": "2010-01-02T00:00:00Z"
}
},
We'd likely need a larger discussion whether it makes sense to have it in the Item Asset Definition Extension, e.g. on one of the Monday calls. Feel free to join, if you can although I guess TZ differences make it hard for you. I can put that on the agenda and discuss it for you, if you can't join.
Otherwise, you could likely come up with a new extension that looks similar to what you've shown above.
If I can make it, I will. I think my email address is public if you want to send an invite there? Otherwise, if I'm not there, please put it on the agenda for me. Thanks.
Okay, @matthewhanson can you invite @billgeo, please?
Participation in the call was low this week so we postponed it to the next meeting, but in general people agreed on one of the approaches mentioned above. We identified that having it as part of the Item Asset Definition extension could lead to validation issues in JSON Schema.