ort icon indicating copy to clipboard operation
ort copied to clipboard

NuGetSupport: Replace license URLs with SWH content hashes

Open sschuberth opened this issue 3 years ago • 8 comments

Many NuGet packages only declare license URLs which, in contrast to license names, cannot reliably be mapped to SPDX license expressions as the content the URL refers to might change over time.

To support a reliable mapping if only a license URL is present, do not use the URL itself as a declared license, but replace it with an ORT-specific SPDX LicenseRef that includes the Software Heritage hash over the content of the file the license URL refers to.

Signed-off-by: Sebastian Schuberth [email protected]

sschuberth avatar Apr 14 '22 08:04 sschuberth

Alternative solution could be: Extend declared license mapping to allow specifying a content hash.

fviernau avatar Sep 23 '24 08:09 fviernau

Alternative solution could be: Extend declared license mapping to allow specifying a content hash.

Isn't that exactly what's being done with e.g. https://github.com/oss-review-toolkit/ort/pull/5259/files#diff-4e93601aa35e18daab883e47b59f0e54cc0614a6b0d76e3f60acb78f8965f57aR321? Or do you mean a simpler syntax that does not incorporate the content hash into a LicenseRef, but just lists the content hash?

sschuberth avatar Sep 23 '24 08:09 sschuberth

Isn't that exactly what's being done with e.g. https://github.com/oss-review-toolkit/ort/pull/5259/files#diff-4e93601aa35e18daab883e47b59f0e54cc0614a6b0d76e3f60acb78f8965f57aR321? Or do you mean a simpler syntax that does not incorporate the content hash into a LicenseRef, but just lists the content hash?

I meant a solution which works without creating new generated license identifiers per distinct (url, hash).

fviernau avatar Sep 23 '24 08:09 fviernau

I meant a solution which works without creating new generated license identifiers per distinct (url, hash).

IMO that's the case already for the current proposal. New entries are only generated for each new (filename, hash) pair, but not for each (url, hash) pair. And I'd also be fine with dropping the filename from the LicenseRef.

Edit: That actually goes a bit into the direction of this old proposal.

sschuberth avatar Sep 23 '24 09:09 sschuberth

IMO that's the case already for the current proposal. New entries are only generated for each new (filename, hash) pair, but not for each (url, hash) pair. And I'd also be fine with dropping the filename from the LicenseRef.

I believe using the content hash from time = now() is wrong anyway, because the content at the point in time the artifact has been created is relevant but not the one from now(). Either the content from back then is unkown or it can be found manually, e.g. using the wayback machine. When doing so, the process is manual anyway, the result could just be encoded using a declared license mapping. I simply to not understand how this proposal can help solve the problem in a better way.

fviernau avatar Sep 23 '24 09:09 fviernau

I believe using the content hash from time = now() is wrong anyway, because the content at the point in time the artifact has been created is relevant but not the one from now(). Either the content from back then is unkown or it can be found manually, e.g. using the wayback machine. When doing so, the process is manual anyway, the result could just be encoded using a declared license mapping. I simply to not understand how this proposal can help solve the problem in a better way.

The original goal of this PR was not to implement a way to have declared license mappings for the contents of license from back when the package was published, but as a first step to simply have something like "permalinks" to the current content of the license. Finding out the right permalink for a previous version of the contents could then be done as a second step.

sschuberth avatar Sep 23 '24 09:09 sschuberth

The original goal of this PR was not to implement a way to have declared license mappings for the contents of license from back when the package was published,

From the PR desccription (following excerpt), I got that this is beeing proposed in order to support a reliable mapping. However, the (current) content as I pointet out is not relevant to the mapping.

To support a reliable mapping if only a license URL is present, do not use the URL itself as a declared license, but replace it with an ORT-specific SPDX LicenseRef that includes the Software Heritage hash over the content of the file the license URL refers to.

fviernau avatar Sep 23 '24 09:09 fviernau

From the PR desccription (following excerpt), I got that this is beeing proposed in order to support a reliable mapping.

I see how this can be misleading based on individual expectations, but "reliable" was more meant like "deterministic" regarding the content a URL points to.

sschuberth avatar Sep 23 '24 11:09 sschuberth