stac-validator icon indicating copy to clipboard operation
stac-validator copied to clipboard

How to programmatically handle SPDX licenses?

Open schwehr opened this issue 5 years ago • 5 comments

Having a typo in the license info would be a bummer. What would be the best way to validate licenses?

Would https://github.com/spdx/tools be a good idea?

schwehr avatar Sep 17 '20 16:09 schwehr

In stac4s we fail decoding if the license is neither a valid SPDX id or the string "proprietary" -- https://github.com/azavea/stac4s/blob/master/modules/core/src/main/scala/com/azavea/stac4s/StacLicense.scala#L39-L49 via spdx-license-checker. I don't know anything about tools-python in spdx-land but something like it would be pretty helpful for typo protection I think

jisantuc avatar Sep 17 '20 22:09 jisantuc

The JSON validation includes light validation of the license field.

I think tools-python would be the best way to validate licenses, but I'm not sure if that's a PySTAC or user concern - i.e. I'm sure there are a lot of fields that could use broader validation for content than the JSON Schema is providing, but it might be a higher level concern to validate those rather than have PySTAC bring in libraries that could validate each one. Maybe something to consider putting into stac-validator since that has a bit higher level and more specific validation focus?

lossyrob avatar Sep 18 '20 14:09 lossyrob

I can see pushing it into stac-validator, but I'd also think it would be valuable to avoid creating invalid items. The language in the table for collections is pretty clear:

Collection's license(s), either a SPDX License identifier, various if multiple licenses apply or proprietary for all other cases.

That's knowable at the time that you construct the collection. I get the goal to minimize dependencies, and the largely unpinned tools-python deps are pretty frightening, so maybe there's another library or strategy available to protect against license typos.

jisantuc avatar Sep 18 '20 14:09 jisantuc

We expect to be given STAC files by users and will likely not have control on how they are generated.

schwehr avatar Sep 18 '20 18:09 schwehr

Since this ticket was opened, our non-official policy seems to have solidified around pushing all non-jsonschema validation up to stac-validator, so I'm going to transfer this issue to stac-validator. If we do want to pull more advanced validation down to a PySTAC validator, that would be fine, but that non-trivial amount of future work should be tracked in its own ticket.

gadomski avatar Jan 31 '23 14:01 gadomski