PURLDB: Add on-demand package data collection for golang
We want to enable the collect/ endpoint for golang to avoid failure to collect a golang PURL like pkg:golang/github.com/gorilla/context :
{
"status": "cannot fetch Package data for pkg:golang/github.com/gorilla/context: no available handler"
}
(https://pkg.go.dev/github.com/gorilla/context)
As most of the Go projects are hosted on github, we can redirect to use the code for "pkg:github/" for these projects.
Following are some sample projects that are not hosted on github: https://pkg.go.dev/ben.gmbh/sourcehut-vanity https://pkg.go.dev/gitlab.com/gitlab-org/api/client-go https://pkg.go.dev/golang.org/x/oauth2/bitbucket https://pkg.go.dev/bitbucket.org/lebronto_kerovol/gwerror
@JonoYang I am able to make it work with
pkg:golang/github.com/*
pkg:golang/gitlab.com/*
pkg:golang/bitbucket.org/*
which should cover most of the go projects. However, for the rest, it's hard to fetch the metadata unless we do web scrapping. Do you think web scrapping is the correct approach for the other cases?
@chinyeungli do you mean scraping the GitHub/gitlab/bitbucket page for metadata? There may be an api for that, otherwise I think it's worth trying
No. I mean I can retrieve metadata from APIs for GitHub, GitLab, and Bitbucket. I was referring to scraping metadata for other sources on pkg.go.dev such as https://pkg.go.dev/ben.gmbh/sourcehut-vanity and https://pkg.go.dev/golang.org/x/oauth2/bitbucket
I think we should scrape pkg.go.dev. There may be some code for this in spats
For gitlab, it sometimes asks for username and password in my terminal, but it runs fine if I just simply press enter without entering username and password, but if I didn't press the enter, it seems it's stuck and waiting for input.
IMHO we should use the Go proxy if possible, not pkg.go.dev... unless that's the only place with details.
@pombredanne Are you referring to https://proxy.golang.org/ ? I don’t see how to retrieve metadata from it. However, I just discovered https://deps.dev/ , a Google project, which might be able to do the job this. I’ll give it a shot.
@chinyeungli please sync up with @TG1999 See in particular:
- https://github.com/package-url/packageurl-python/pull/195#discussion_r2236722238
- https://github.com/aboutcode-org/go-inspector/blob/442bc5b83d5aeff2b7a27937ec82b63277bc8f7c/src/go_inspector/utils.py#L211
The PR is at:
- https://github.com/aboutcode-org/purldb/pull/608
The build_golang_download_url() needs to be fixed as the returned download url is not correct. See https://github.com/package-url/packageurl-python/issues/198
This is ready for review, but it is currently using a branch in scancode.io https://github.com/aboutcode-org/purldb/blob/596_add_on-demand_package_data_collection_for_golang/setup.cfg#L65 instead of a stable release as scancode.io needs to update to use packageurl-python >= 0.17.3
Created a PR in scancode.io to upgrade packageurl-python to 0.17.3
- https://github.com/aboutcode-org/scancode.io/pull/1809
- PR created: https://github.com/aboutcode-org/purldb/pull/608
This is done now:
Reference:
- https://github.com/aboutcode-org/purldb/pull/608
- https://github.com/aboutcode-org/purldb/issues/644