purldb icon indicating copy to clipboard operation
purldb copied to clipboard

PURLDB: Add on-demand package data collection for golang

Open chinyeungli opened this issue 9 months ago • 6 comments

We want to enable the collect/ endpoint for golang to avoid failure to collect a golang PURL like pkg:golang/github.com/gorilla/context :

{
  "status": "cannot fetch Package data for pkg:golang/github.com/gorilla/context: no available handler"
}

(https://pkg.go.dev/github.com/gorilla/context)

chinyeungli avatar Mar 31 '25 05:03 chinyeungli

As most of the Go projects are hosted on github, we can redirect to use the code for "pkg:github/" for these projects.

Following are some sample projects that are not hosted on github: https://pkg.go.dev/ben.gmbh/sourcehut-vanity https://pkg.go.dev/gitlab.com/gitlab-org/api/client-go https://pkg.go.dev/golang.org/x/oauth2/bitbucket https://pkg.go.dev/bitbucket.org/lebronto_kerovol/gwerror

chinyeungli avatar Apr 08 '25 22:04 chinyeungli

@JonoYang I am able to make it work with

pkg:golang/github.com/*
pkg:golang/gitlab.com/*
pkg:golang/bitbucket.org/*

which should cover most of the go projects. However, for the rest, it's hard to fetch the metadata unless we do web scrapping. Do you think web scrapping is the correct approach for the other cases?

chinyeungli avatar Apr 14 '25 10:04 chinyeungli

@chinyeungli do you mean scraping the GitHub/gitlab/bitbucket page for metadata? There may be an api for that, otherwise I think it's worth trying

JonoYang avatar Apr 14 '25 18:04 JonoYang

No. I mean I can retrieve metadata from APIs for GitHub, GitLab, and Bitbucket. I was referring to scraping metadata for other sources on pkg.go.dev such as https://pkg.go.dev/ben.gmbh/sourcehut-vanity and https://pkg.go.dev/golang.org/x/oauth2/bitbucket

chinyeungli avatar Apr 14 '25 22:04 chinyeungli

I think we should scrape pkg.go.dev. There may be some code for this in spats

JonoYang avatar Apr 14 '25 23:04 JonoYang

For gitlab, it sometimes asks for username and password in my terminal, but it runs fine if I just simply press enter without entering username and password, but if I didn't press the enter, it seems it's stuck and waiting for input.

chinyeungli avatar Apr 15 '25 14:04 chinyeungli

IMHO we should use the Go proxy if possible, not pkg.go.dev... unless that's the only place with details.

pombredanne avatar Jul 24 '25 18:07 pombredanne

@pombredanne Are you referring to https://proxy.golang.org/ ? I don’t see how to retrieve metadata from it. However, I just discovered https://deps.dev/ , a Google project, which might be able to do the job this. I’ll give it a shot.

chinyeungli avatar Jul 25 '25 01:07 chinyeungli

@chinyeungli please sync up with @TG1999 See in particular:

  • https://github.com/package-url/packageurl-python/pull/195#discussion_r2236722238
  • https://github.com/aboutcode-org/go-inspector/blob/442bc5b83d5aeff2b7a27937ec82b63277bc8f7c/src/go_inspector/utils.py#L211

pombredanne avatar Jul 28 '25 14:07 pombredanne

The PR is at:

  • https://github.com/aboutcode-org/purldb/pull/608

pombredanne avatar Jul 28 '25 14:07 pombredanne

The build_golang_download_url() needs to be fixed as the returned download url is not correct. See https://github.com/package-url/packageurl-python/issues/198

chinyeungli avatar Jul 31 '25 00:07 chinyeungli

This is ready for review, but it is currently using a branch in scancode.io https://github.com/aboutcode-org/purldb/blob/596_add_on-demand_package_data_collection_for_golang/setup.cfg#L65 instead of a stable release as scancode.io needs to update to use packageurl-python >= 0.17.3

chinyeungli avatar Aug 01 '25 06:08 chinyeungli

Created a PR in scancode.io to upgrade packageurl-python to 0.17.3

  • https://github.com/aboutcode-org/scancode.io/pull/1809

chinyeungli avatar Aug 11 '25 08:08 chinyeungli

  • PR created: https://github.com/aboutcode-org/purldb/pull/608

chinyeungli avatar Sep 04 '25 06:09 chinyeungli

This is done now:

Reference:

  • https://github.com/aboutcode-org/purldb/pull/608
  • https://github.com/aboutcode-org/purldb/issues/644

TG1999 avatar Sep 09 '25 14:09 TG1999