stac-validator
stac-validator copied to clipboard
How to programmatically handle SPDX licenses?
Having a typo in the license info would be a bummer. What would be the best way to validate licenses?
Would https://github.com/spdx/tools be a good idea?
In stac4s we fail decoding if the license is neither a valid SPDX id or the string "proprietary" -- https://github.com/azavea/stac4s/blob/master/modules/core/src/main/scala/com/azavea/stac4s/StacLicense.scala#L39-L49 via spdx-license-checker. I don't know anything about tools-python in spdx-land but something like it would be pretty helpful for typo protection I think
The JSON validation includes light validation of the license field.
I think tools-python would be the best way to validate licenses, but I'm not sure if that's a PySTAC or user concern - i.e. I'm sure there are a lot of fields that could use broader validation for content than the JSON Schema is providing, but it might be a higher level concern to validate those rather than have PySTAC bring in libraries that could validate each one. Maybe something to consider putting into stac-validator since that has a bit higher level and more specific validation focus?
I can see pushing it into stac-validator, but I'd also think it would be valuable to avoid creating invalid items. The language in the table for collections is pretty clear:
Collection's license(s), either a SPDX License identifier, various if multiple licenses apply or proprietary for all other cases.
That's knowable at the time that you construct the collection. I get the goal to minimize dependencies, and the largely unpinned tools-python deps are pretty frightening, so maybe there's another library or strategy available to protect against license typos.
We expect to be given STAC files by users and will likely not have control on how they are generated.
Since this ticket was opened, our non-official policy seems to have solidified around pushing all non-jsonschema validation up to stac-validator, so I'm going to transfer this issue to stac-validator. If we do want to pull more advanced validation down to a PySTAC validator, that would be fine, but that non-trivial amount of future work should be tracked in its own ticket.