release-data icon indicating copy to clipboard operation
release-data copied to clipboard

WIP: Repology Updater

Open captn3m0 opened this issue 3 years ago • 4 comments

What it will do:

Repository Level (This is entirely TODO)

  1. Maintain a list of repology repository prefixes that are relevant
  2. Generate a list of source package directory URLs from (1)
  3. Fetch the list of packages and keep it locally.

Package Level (Some of this is done)

  1. Fetch the list of repology identifiers for a product
  2. These are used to fetch the relevant repology project
  3. For each product, filter the list of packages by a list of repository prefixes
  4. For each such repository, use the list of packages generated above and use it to generate a comprehensive list of PURLs (TODO)
  5. Finally, save the list of PURLs to disk

The final version should deliver a clear and comprehensive list of PURLs for a given product, where each PURL represents the latest version of a package available on a specific distribution channel (not necessarily linux distro).

These PURLs can then be used to augment scan results, by generating feeds for scanning products. The usecase could be:

  1. Use type/namespace/name to check if product is in our database
  2. Use the version against our list from above to see if it is the latest version available on that channel. Give warning if not.
  3. If it is the latest version, check to see if the latest version is considered supported. Additionally, use the channel's support status as well (such as debian support dates, repository information) to provide clear guarantee of support.

Depending on results from 1,2,3: return a vulnerability rating. Most of the scanning part can perhaps be done by existing scanners, so we are looking to bootstrap this by generating a "feed" instead.

Feed Details:

  1. A vulnerability feed typically contains information about known vulnerabilities in various products, using package name, channel, and version ranges.
  2. We can generate such a feed from our PURLs and EOL API. Each unsupported release cycle can be used to craft a "pseudo"-vulnerability that triggers on unsupported versions being detected.
  3. The feed will need a lot of exceptions for supported packages on various channels, which is why we need to do repology scraping

captn3m0 avatar Dec 23 '22 05:12 captn3m0

Found out that this was a lot more work than I'd expected, due to my flawed understanding of what all repology tracked. Repology tracks source-packages, where it can, to reduce effort and make tracking easier. This works, since repology is more interested in tracking "what version of a package is available in a repository" rather than "all the various ways this package can be installed".

We're interested in the latter (we want a SBOM -> package -> PURL -> product lookup). But for that, we need an exhaustive list of all packages that are built from a source-package. This happens in many cases, but most prominently in the case of debian and rpm based distros.

For eg, https://repology.org/api/v1/project/zookeeper has a single entry for debian bookworm. That entry links it to the zookeeper source package, which is listed at https://packages.debian.org/bookworm/source/zookeeper

That itself gets built into 10 separate binary packages, which are all those we actually want to track. It is in generating this mapping that I'm working on currently - this involves parsing the package files across all distros, and took some effort.

Got it working for DEB distros.

captn3m0 avatar Dec 26 '22 14:12 captn3m0

Doing some investigation into MongoDB as an example.

https://repology.org/api/v1/project/mongodb

For Ubuntu, the packages installed are from repo.mongodb.org

Get:1 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-database-tools amd64 100.6.1 [48.0 MB]
Get:2 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-mongosh amd64 1.6.1 [37.7 MB]
Get:3 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-shell amd64 6.0.3 [3,080 B]
Get:4 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-server amd64 6.0.3 [28.9 MB]
Get:5 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-mongos amd64 6.0.3 [20.3 MB]
Get:6 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-database-tools-extra amd64 6.0.3 [7,752 B]
Get:7 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-database amd64 6.0.3 [3,540 B]
Get:8 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org-tools amd64 6.0.3 [2,892 B]
Get:9 https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0/multiverse amd64 mongodb-org amd64 6.0.3 [2,932 B]

But repology has no knowledge of this package existing in this repo. Would resolving this be as simple as adding a new repository to repology and then finding the binaries installed from the repo package?

noqcks avatar Dec 31 '22 20:12 noqcks

Since this is a small list, we could easily add static PURLs for all of these. We could scan the repo as well, but that only makes sense for larger significant repositories.

captn3m0 avatar Jan 01 '23 05:01 captn3m0

@captn3m0 do you have WIP commits on this branch you could push?

Might be able to work in parallel here. I can tackle searching packages in other distros.

noqcks avatar Jan 02 '23 23:01 noqcks