lib4sbom icon indicating copy to clipboard operation
lib4sbom copied to clipboard

feat(purl): first purl iteration

Open ffontaine opened this issue 2 years ago • 5 comments

Guessing purl thanks to cpe2purl database, upstream purl (e.g. github, gitlab, sourceforge) are preferred over distribution specific purl (e.g. debian, ubuntu, fedora, etc.)

In this first iteration, only json and cyclonedx is handled. Moreover, purl2cpe can't be installed through pypi so purl2cpe.db has to be manually built. I couldn't push it as database is too big (around 350 MB).

ffontaine avatar Sep 14 '23 15:09 ffontaine

@ffontaine I have now had a look at the purl2cpe database. As you have noticed, the purl's don't have any version information included which is very disappointing. Is there any reason why the upstream sources have been limited? Why not include the language ecosystems such as pypi, npm or distro sources such as deb?

This change would require the database to be included as part of the lib4sbom. I suggest it is stored in a separate directory in the same way the license data is maintained and then accessed in a similar way (see license.py). Maybe create a new class PurlGenerator?

anthonyharrison avatar Oct 07 '23 15:10 anthonyharrison

Hey @anthonyharrison!

as @terriko mentioned in #3771:

Some things won't have CPE entries and thus won't be in purl2CPE. But we may know (from bug reports) that there's a product with the same name that is absolutely not the same thing. So we'll need to provide a "is not" database to reduce false positives. I suggest using a similar setup to what purl2cpe does -- allow humans to submit pull requests, make all the data readable, provide a way to load it into a queryable database.

we are gonna need to make a database similar to purl2cpe to incorporate purls/products that doesn't have CPEs, so would it be reasonable to maintain the whole database in-house instead of making a new one? by doing so we could also include version info from cpe into purl which is given in the database:

pkg:github/silverstripe/silverstripe-framework|cpe:2.3:a:silverstripe:framework:4.13.25:*:*:*:*:*:*:*

inosmeet avatar Mar 11 '24 15:03 inosmeet

@Dev-Voldemort There is a lot of activity in the purl/CPE space at the moment. I would be interested in understanding what your database would be doing noting that a 1-1 mapping of PURL to CPE is not possible.

anthonyharrison avatar Mar 11 '24 17:03 anthonyharrison

As you said in this comment, some decisions needs to be made.

I was thinking, since we will be generating purl ourselves with more info like version, subpath i.e: #3833 So why not maintain our own database with more informative purls, which might help in more precise mapping (I may be wrong here). And in cases where something doesn't have CPE, we can add an appropriate entry for that.

All of this is based on the assumption that we will be utilizing purl2cpe via installing their database.

P.S: I'm in the process of making GSOC proposal, and am a little confused how would purl2cpe integration would help without version info.

inosmeet avatar Mar 12 '24 08:03 inosmeet

@Dev-Voldemort I can see various discussions about GSOC and purls. Can I suggest you keep the discussions on the GSOC thread and not in lib4sbom.

Lib4sbom is a SBOM generator/parser library. The data (e.g. PURL) should be coming from the script which is using lib4sbom the calling script should be responsible for ensuring the data is correct. Validating that the purl is correct (other than it is of the correct format) is not the responsibility of the library.

anthonyharrison avatar Mar 13 '24 09:03 anthonyharrison