scorecard icon indicating copy to clipboard operation
scorecard copied to clipboard

Feature: improve packaging

Open laurentsimon opened this issue 4 years ago • 14 comments

Improvements:

  1. the Packaging checks only looks for GH packaging workflows. This is not the only way to publish code. We should check for the presence of the package on language repos. Example: for npm: the package.json has a "repository" field, and metadata may be available from the npm API. Alternatively, we could look at the name in package.json of the repository, then check npm to see if that package exists.

  2. The check currently uses regex, we should switch to parsing properly.

  3. ~~we're missing some of the registries in https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-rubygems-registry~~

  4. ~~we're missing go packages, see https://github.com/ossf/scorecard/blob/main/.github/workflows/goreleaser.yaml~~

  5. we're missing github marketplace actions

  6. ~~Update the Token-Permission workflow as well, as it also checks for the need of packages permission.~~

laurentsimon avatar Jul 13 '21 16:07 laurentsimon

Improvements:

  1. the Packaging checks only looks for GH packaging workflows. This is not the only way to publish code. We should check for the presence of the package on language repos.

For golang, we would like to query the https://pkg.go.dev/ and see if a corresponding package exists

laurentsimon avatar Aug 24 '21 18:08 laurentsimon

pkg.go.dev developer here.

pkg.go.dev learns everything it knows from the Go module proxy, https://proxy.golang.org. Visit that page for a description of the protocol.

So the proxy is the source of truth, and it can also handle much higher QPS than us. However, it's less discriminating: it doesn't examine what it's given to make sure it's really a Go module. For instance, it doesn't check for the presence of .go files. (No .go file, no module.) We do.

So if that is important to you, then checking pkg.go.dev is reasonable, provided it's at relatively low QPS. If you know the version of the module you're looking for, supplying it will reduce load on us. A sufficient check for existence is to check the status of a HEAD request, and treat anything other than 200 as false.

jba avatar Oct 24 '21 13:10 jba

pkg.go.dev developer here.

pkg.go.dev learns everything it knows from the Go module proxy, https://proxy.golang.org. Visit that page for a description of the protocol.

So the proxy is the source of truth, and it can also handle much higher QPS than us. However, it's less discriminating: it doesn't examine what it's given to make sure it's really a Go module. For instance, it doesn't check for the presence of .go files. (No .go file, no module.) We do.

So if that is important to you, then checking pkg.go.dev is reasonable, provided it's at relatively low QPS. If you know the version of the module you're looking for, supplying it will reduce load on us. A sufficient check for existence is to check the status of a HEAD request, and treat anything other than 200 as false.

Thanks, is there an API for the pkg.go.dev? It would help a lot instead of doing HTML parsing.

naveensrinivasan avatar Oct 24 '21 16:10 naveensrinivasan

There is no API, but if you're just checking for existence you don't need to parse HTML. Is there some other information you need?

jba avatar Oct 24 '21 21:10 jba

Thanks @jba So essentially we just need to check for the HTTP status. Should be good enough for our current use case I think. @naveensrinivasan anything else you think we need?

laurentsimon avatar Oct 25 '21 17:10 laurentsimon

I can't think of anything as of now. Thanks

naveensrinivasan avatar Oct 25 '21 17:10 naveensrinivasan

@di what can be done on the pypi side, similar to https://github.com/ossf/scorecard/issues/688#issuecomment-950321692?

laurentsimon avatar Nov 01 '21 23:11 laurentsimon

IIUC you're looking for a way to determine if a given project name is published on PyPI?

That would require checking if the name exists (via HTTP status) at either:

  • Simple API: https://pypi.org/simple/<project_name>
  • JSON API: https://pypi.org/pypi/<project_name>/json

More details on available APIs here: https://warehouse.pypa.io/api-reference/

di avatar Nov 01 '21 23:11 di

That's exactly what we need. Thank you!

laurentsimon avatar Nov 02 '21 15:11 laurentsimon

@di is there an API that takes as input a GitHub repository instead of a package name? Or would we need to query the "project links" to infer the package name to input to the API? If so, what's the best way to do it?

laurentsimon avatar Feb 02 '22 16:02 laurentsimon

@laurentsimon You mean that you have a GitHub repo and you want to determine what PyPI project it corresponds to?

There's a couple ways:

  • Build the project hosted in the repo (complicated, might not work), see what package name it produces, and assume this corresponds to a project on PyPI (it might have the same name but not be the same project, so it's not guaranteed)
  • Build a mapping of project links to PyPI packages (no API exists for this) and make the assumption that the metadata is correct (anyone can put any project link pointing to any GitHub repo they want, so it's not guaranteed)
  • Wait for PyPI's OIDC integration & support for publishing from GitHub Actions to land, which requires a strong link between a GitHub repo and a PyPI project (guaranteed, and we can make this available via an API)

di avatar Feb 02 '22 16:02 di

Thank you @di Let's wait until OIDC provides the magic.

laurentsimon avatar Feb 02 '22 16:02 laurentsimon

We need to decide the purpose of this check: is it to know that there is a corresponding package for the ecosystem, or whether the publishing steps occur on CI and not a local dev machine? If the latter, we only need to look for additional commands and not worry about querying package registries.

laurentsimon avatar Aug 23 '22 15:08 laurentsimon

Also note that for Go projects, this checks will fail and it's a false negative. Go project don't need to be "released", since they are fetched directory from the repository (and then cached) when consumers do go install or go get

laurentsimon avatar Aug 23 '22 15:08 laurentsimon