Equivalent words and <alt> tags

Open brianwarner opened this issue 2 months ago • 1 comments

Hi all,

Is there a reason for keeping an equivalent words document, as opposed to incorporating them directly into the XML files as tags?

I see from the spec that "XML files do not require specific markup to implement this guideline." However, doesn't this split the source of truth for optional text across two disjoint locations?

I'd have a similar question about "©", "(c)", or "Copyright", and "http://" or "https://".

Oct 30 '25 22:10 brianwarner

Add <alt.. tags for all the uses of the terms would require updating 700+ files. We don't have enough volunteers to do this manually ;).

From what I recall, we have discussed creating a tool to generate alt tags based on pre-processing a different source - but then we would be maintaining two sets of sources - one for the input and one for the generated XML. We concluded it was more reasonable to maintain a separate file for the equivalent words than to generate another full set of license files. Besides, someone would have to write the utility to pre-process the XML. If we were to take this approach, I would strongly suggest we keep the license-list-XML repo as is and store the generated XMLs in the license-list-data repo where all the other generated forms of the license list exists.

@swinslow may recall more of the discussions

In the license matching code @pmonks and I maintain for the SPDX Java Library and the SPDX online tools, it normalizes both the license text and the input compare text rather than doing a regex based pattern match for all possible patterns - this seems to be faster, especially if you "pre-normalize" the license text. Processing the <alt tags with user input regular expressions can be rather compute intensive. So, I would prefer to use the list of equivalent words rather than the <alt tags for the Java implementation.

Oct 30 '25 23:10 goneall