go-licenses icon indicating copy to clipboard operation
go-licenses copied to clipboard

improve robustness of linking to license on hosting website

Open Bobgy opened this issue 3 years ago • 8 comments

In v2, I implemented some utils to get github repo from go-import=1 and use it to generate public & versioned links to detected licenses' hosting website (for now, only github).

I noticed some harder problems:

  1. distinguishing "major branch" and "major subdirectory" conventions

There is one problem: for a major version greater than 1, the templates for “major branch” and “major subdirectory” conventions differ (See https://research.swtch.com/vgo-module for a discussion of these conventions.) To determine the right template, make a HEAD request for the go.mod file using each template, and select the one that succeeds. For example, for module github.com/a/b/v2 at version v2.3.4, probe both github.com/a/b/blob/v2.3.4/go.mod (the location of the go.mod file using the “major branch” convention) and github.com/a/b/blob/v2.3.4/v2/go.mod (its location using “major subdirectory”).

  1. support modules not at root of a repo, example https://github.com/Azure/go-autorest/tree/autorest/v0.9.0. Note that tags are also different, a tag "autorest/v0.9.0" means v0.9.0 version of the module ROOT/autorest. https://github.com/googleapis/google-cloud-go/tree/master/storage is another example, tags for it has "storage/` prefix.
  2. support other source hosting websites

Potential Solution

@wlynch pointed out the following references, there's an internal source package built for pkgsite that exactly provides a package that can figure out repo hosting website of a go import path and get a public link to source code. However, the package is internal, so we cannot directly import it.

I'll ask if they are ready to make it public, or I have to vendor it in some way.

EDIT: the reply is that we need to vendor it: https://github.com/golang/go/issues/40477#issuecomment-868532845.

References

  • https://github.com/google/go-licenses/pull/67#discussion_r653969960
  • https://github.com/golang/go/issues/39559
  • https://github.com/golang/go/issues/40477

Bobgy avatar Jun 25 '21 08:06 Bobgy

I noticed that problem 2 and 3 are mostly solved by pkgsite/source package. While problem 1 -- distinguishing "major branch" and "major subdirectory" conventions may still cause incorrect remote URLs.

We will still need to leave this issue as open.

Bobgy avatar Jan 05 '22 11:01 Bobgy

Giving a breaking example for case 2 "support modules not at root":

$ go-licenses csv cloud.google.com/go/storage
...
cloud.google.com/go/storage, https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE, Apache-2.0
...

Note the URL https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/storage/LICENSE is broken, the correct URL should be https://github.com/googleapis/google-cloud-go/blob/storage/v1.10.0/LICENSE. The problem is caused by the fact that:

  • for modules in a subdir of a repo, when go caches module files and found the submodule does not have a LICENSE file, it "magically" copies LICENSE file from root folder to the sub-module. e.g. https://github.com/googleapis/google-cloud-go/tree/storage/v1.10.0/storage
  • therefore, go-licenses finds a LICENSE file at root of submodule and tries to guess its remote URL as root of submodule, while the actual LICENSE file is at root of repo

Note, adopting pkgsite/source allowed us to get the correct tag storage/v1.10.0 for this repo, but we still hit this LICENSE file path problem.

Bobgy avatar Jan 23 '22 10:01 Bobgy

Examples for problem 1: distinguishing "major branch" and "major subdirectory" conventions

Major branch (result is correct)

Major branch: a new major version is released in a branch, source code is at root of repo. gopkg.in/yaml.v2 License: https://github.com/go-yaml/yaml/blob/v2.4.0/LICENSE

Major subdirectory (incorrect)

Major subdir: a new major version is released in a subdir in the same branch as v1, source code for v2 is at a subdir ./v2/ github.com/googleapis/gax-go/v2 License: got https://github.com/googleapis/gax-go/blob/v2.1.1/v2/LICENSE, but should be https://github.com/googleapis/gax-go/blob/v2.1.1/LICENSE

Therefore, root cause for this failure example is in fact the same as https://github.com/google/go-licenses/issues/73#issuecomment-1019453152. The guessed URL is incorrect for module not at the root of a repo.

Bobgy avatar Jan 23 '22 11:01 Bobgy

Added a v2 proposal roadmap item: validate license URL by fetching it, we can detect these failures and turn the URL into unknown or try other locations again and finally verifying file content is exactly the same. With these workarounds, we can mitigate the issue of user unknowingly got an invalid URL.

Bobgy avatar Jan 24 '22 02:01 Bobgy

Furthermore, we can solve all above broken cases by:

  1. Infer remote license URL as usual
  2. Fetch raw license file from remote, validate it's the same as the locally found license file
  3. If 2 failed, we can further try and validate LICENSE at repo root
  4. If everything failed, return UNKNOWN

Bobgy avatar Feb 03 '22 05:02 Bobgy

Could you export a (versioned) URL to the root of the repo as well? Possibly a breaking change to add it to the CSV, but it could be added to the data available to templates.

I'm creating a licenses page in my web app and would like to link the package name to the respective github (or wherever) page.

dschmidt avatar Sep 06 '22 10:09 dschmidt

Possibly a breaking change to add it to the CSV

The csv format is fixed, I would not modify it.

but it could be added to the data available to templates.

Welcome a PR, this isn't too hard.

Bobgy avatar Sep 06 '22 13:09 Bobgy

Okies, already started and have it basically working - unfortunately I won't have time to polish/finish it this/next week, but will do when I get to it.

dschmidt avatar Sep 06 '22 13:09 dschmidt