scorecard
scorecard copied to clipboard
Feature: improve packaging
Improvements:
-
the
Packagingchecks only looks for GH packaging workflows. This is not the only way to publish code. We should check for the presence of the package on language repos. Example: for npm: the package.json has a "repository" field, and metadata may be available from the npm API. Alternatively, we could look at thenamein package.json of the repository, then checknpmto see if that package exists. -
The check currently uses regex, we should switch to parsing properly.
-
~~we're missing some of the registries in https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-rubygems-registry~~
-
~~we're missing go packages, see https://github.com/ossf/scorecard/blob/main/.github/workflows/goreleaser.yaml~~
-
we're missing github marketplace actions
-
~~Update the Token-Permission workflow as well, as it also checks for the need of
packagespermission.~~
Improvements:
- the
Packagingchecks only looks for GH packaging workflows. This is not the only way to publish code. We should check for the presence of the package on language repos.
For golang, we would like to query the https://pkg.go.dev/ and see if a corresponding package exists
pkg.go.dev developer here.
pkg.go.dev learns everything it knows from the Go module proxy, https://proxy.golang.org. Visit that page for a description of the protocol.
So the proxy is the source of truth, and it can also handle much higher QPS than us. However, it's less discriminating: it doesn't examine what it's given to make sure it's really a Go module. For instance, it doesn't check for the presence of .go files. (No .go file, no module.) We do.
So if that is important to you, then checking pkg.go.dev is reasonable, provided it's at relatively low QPS. If you know the version of the module you're looking for, supplying it will reduce load on us. A sufficient check for existence is to check the status of a HEAD request, and treat anything other than 200 as false.
pkg.go.dev developer here.
pkg.go.dev learns everything it knows from the Go module proxy, https://proxy.golang.org. Visit that page for a description of the protocol.
So the proxy is the source of truth, and it can also handle much higher QPS than us. However, it's less discriminating: it doesn't examine what it's given to make sure it's really a Go module. For instance, it doesn't check for the presence of
.gofiles. (No.gofile, no module.) We do.So if that is important to you, then checking pkg.go.dev is reasonable, provided it's at relatively low QPS. If you know the version of the module you're looking for, supplying it will reduce load on us. A sufficient check for existence is to check the status of a HEAD request, and treat anything other than 200 as false.
Thanks, is there an API for the pkg.go.dev? It would help a lot instead of doing HTML parsing.
There is no API, but if you're just checking for existence you don't need to parse HTML. Is there some other information you need?
Thanks @jba So essentially we just need to check for the HTTP status. Should be good enough for our current use case I think. @naveensrinivasan anything else you think we need?
I can't think of anything as of now. Thanks
@di what can be done on the pypi side, similar to https://github.com/ossf/scorecard/issues/688#issuecomment-950321692?
IIUC you're looking for a way to determine if a given project name is published on PyPI?
That would require checking if the name exists (via HTTP status) at either:
- Simple API:
https://pypi.org/simple/<project_name> - JSON API:
https://pypi.org/pypi/<project_name>/json
More details on available APIs here: https://warehouse.pypa.io/api-reference/
That's exactly what we need. Thank you!
@di is there an API that takes as input a GitHub repository instead of a package name? Or would we need to query the "project links" to infer the package name to input to the API? If so, what's the best way to do it?
@laurentsimon You mean that you have a GitHub repo and you want to determine what PyPI project it corresponds to?
There's a couple ways:
- Build the project hosted in the repo (complicated, might not work), see what package name it produces, and assume this corresponds to a project on PyPI (it might have the same name but not be the same project, so it's not guaranteed)
- Build a mapping of project links to PyPI packages (no API exists for this) and make the assumption that the metadata is correct (anyone can put any project link pointing to any GitHub repo they want, so it's not guaranteed)
- Wait for PyPI's OIDC integration & support for publishing from GitHub Actions to land, which requires a strong link between a GitHub repo and a PyPI project (guaranteed, and we can make this available via an API)
Thank you @di Let's wait until OIDC provides the magic.
We need to decide the purpose of this check: is it to know that there is a corresponding package for the ecosystem, or whether the publishing steps occur on CI and not a local dev machine? If the latter, we only need to look for additional commands and not worry about querying package registries.
Also note that for Go projects, this checks will fail and it's a false negative. Go project don't need to be "released", since they are fetched directory from the repository (and then cached) when consumers do go install or go get