ort
ort copied to clipboard
NuGetSupport: Replace license URLs with SWH content hashes
Many NuGet packages only declare license URLs which, in contrast to license names, cannot reliably be mapped to SPDX license expressions as the content the URL refers to might change over time.
To support a reliable mapping if only a license URL is present, do not
use the URL itself as a declared license, but replace it with an
ORT-specific SPDX LicenseRef that includes the Software Heritage
hash over the content of the file the license URL refers to.
Signed-off-by: Sebastian Schuberth [email protected]
Alternative solution could be: Extend declared license mapping to allow specifying a content hash.
Alternative solution could be: Extend declared license mapping to allow specifying a content hash.
Isn't that exactly what's being done with e.g. https://github.com/oss-review-toolkit/ort/pull/5259/files#diff-4e93601aa35e18daab883e47b59f0e54cc0614a6b0d76e3f60acb78f8965f57aR321? Or do you mean a simpler syntax that does not incorporate the content hash into a LicenseRef, but just lists the content hash?
Isn't that exactly what's being done with e.g. https://github.com/oss-review-toolkit/ort/pull/5259/files#diff-4e93601aa35e18daab883e47b59f0e54cc0614a6b0d76e3f60acb78f8965f57aR321? Or do you mean a simpler syntax that does not incorporate the content hash into a
LicenseRef, but just lists the content hash?
I meant a solution which works without creating new generated license identifiers per distinct (url, hash).
I meant a solution which works without creating new generated license identifiers per distinct
(url, hash).
IMO that's the case already for the current proposal. New entries are only generated for each new (filename, hash) pair, but not for each (url, hash) pair. And I'd also be fine with dropping the filename from the LicenseRef.
Edit: That actually goes a bit into the direction of this old proposal.
IMO that's the case already for the current proposal. New entries are only generated for each new (filename, hash) pair, but not for each (url, hash) pair. And I'd also be fine with dropping the filename from the
LicenseRef.
I believe using the content hash from time = now() is wrong anyway, because the content at the point in time the artifact has been created is relevant but not the one from now(). Either the content from back then is unkown or it can be found manually, e.g. using the wayback machine. When doing so, the process is manual anyway, the result could just be encoded using a declared license mapping. I simply to not understand how this proposal can help solve the problem in a better way.
I believe using the content hash from
time = now()is wrong anyway, because the content at the point in time the artifact has been created is relevant but not the one fromnow(). Either the content from back then is unkown or it can be found manually, e.g. using the wayback machine. When doing so, the process is manual anyway, the result could just be encoded using a declared license mapping. I simply to not understand how this proposal can help solve the problem in a better way.
The original goal of this PR was not to implement a way to have declared license mappings for the contents of license from back when the package was published, but as a first step to simply have something like "permalinks" to the current content of the license. Finding out the right permalink for a previous version of the contents could then be done as a second step.
The original goal of this PR was not to implement a way to have declared license mappings for the contents of license from back when the package was published,
From the PR desccription (following excerpt), I got that this is beeing proposed in order to support a reliable mapping. However, the (current) content as I pointet out is not relevant to the mapping.
To support a reliable mapping if only a license URL is present, do not use the URL itself as a declared license, but replace it with an ORT-specific SPDX LicenseRef that includes the Software Heritage hash over the content of the file the license URL refers to.
From the PR desccription (following excerpt), I got that this is beeing proposed in order to support a reliable mapping.
I see how this can be misleading based on individual expectations, but "reliable" was more meant like "deterministic" regarding the content a URL points to.