Migrate from spdx template files to spdx xml
SPDX has migrated from using text template files to xml. See https://spdx.github.io/spdx-spec/v3.0.1/annexes/license-matching-guidelines-and-templates/#legacy-text-template-format.
We should switch over as well.
For the migration, we would need a licenses.json like .xml file containing all the licenses right? I think I found one please check https://github.com/spdx/license-list-data/tree/main/rdfxml
Also, the basic workflow will be the same if I am not wrong, just fetching from this new xml/rdf file now?
The only difference is the code that handles the fetching and parsing, it will need to handle XML instead of JSON.
Does this sound like the correct approach to you? Just want to double check before I begin the implementation.
I'd double check the SPDX spec on the preferred approach if any so we follow. The json approach I believe was recommended at the time. I haven't checked the json recently, but does it contain the xml now? Or is there a new end point?
Thanks for checking it, I appreciate it! No, unfortunately it doesn't contain xml and there is no new end point as well. However, there is https://github.com/spdx/license-list-data/blob/main/license-list-XML, which contains all the xml files.
The issue is this changes the browser hosts permissions to GitHub which will be much scarier for people. Let's double check the spec didn't identify a SPDX controlled domain for access. Alternatively, @goneall do you see SPDX offering a new official endpoint to iterate and download xml files besides GitHub?
Good catch @alandtse
We could enhance the licenseListPublisher to publish the XML files to the website. I was thinking this would be a good idea anyway since it would solve another issue where we may end up picking up XML files before the official release.
I created an issue to track the publisher enhancement: https://github.com/spdx/LicenseListPublisher/issues/215
@bhorsujal - let me know if you're comfortable in Java and would like to help extend the publisher. Otherwise, I can do the copying of individual XML files which is rather straightforward.
Once the publisher is enhanced, I can copy over the XML files for the current release so that it is available prior to the next release of the license list.
cc: @swinslow
It's been a while since I reviewed the json endpoint, but perhaps that can just add a key to serve the xml links?
@bhorsujal - let me know if you're comfortable in Java and would like to help extend the publisher. Otherwise, I can do the copying of individual XML files which is rather straightforward.
I am okay with basic Java, I will do it.
It's been a while since I reviewed the json endpoint, but perhaps that can just add a key to serve the xml links?
We could add the endpoint to the licenses.json file on the website as part of the licenseListPublisher enhancements - similar to the detailsUrl property.
Hi @alandtse, I was working with @goneall on XML file endpoint issue and facing some issues. I created a single licenses.xml file (around 8MB), it contains all the XML license files combined. Wanted to ask if making an endpoint for that file would be adviced or not?
Also, later Gary suggested we could add references to XML files in the current licenses.json. what do you think about the performance? I think it will slow down for XML, because JSON parsing is much more simpler than running DOMparser for each XML file. I am not so sure about the performance
Check out all the licenses and exceptions in XML here
Since you control both ends, do what feels best.
My suggestion to Gary was to add xml links to the existing endpoint so it would be easy for existing users to find the new xml versions for download. I didn't consider other uses more than downloading so didn't consider any further parsing.