purl-spec icon indicating copy to clipboard operation
purl-spec copied to clipboard

custom and composite package types for semi-structured repositories

Open hartsock opened this issue 3 years ago • 2 comments

I'm attempting to have my company adopt pURL for internal package identification. I've identified some non-standard package types and some products that produce what are merely tar or zip files.

An extreme example:

https://commondatastorage.googleapis.com/chromium-boringssl-fips/boringssl-ae223d6138807a13006342edfeef32e813246b39.tar.xz

To pURL-ify this URL I might try to write:

pkg:archive/commondatastorage.googleapis.com/chromium-boringssl-fips#boringssl@ae223d6138807a13006342edfeef32e813246b39?format=tar.xz

... which I'm not sure is a good match but encodes where the package was from and you might realistically have a simple algorithm to go between the two forms of the URL. But, is this actually more valuable than just having a normal download URL where you got the archive from?

Similarly, I've found several "products" that are released as a suite of components. A component is fairly easy to identify as an RPM file for example. But the entire product requires the installation of an entire list of RPM and these can include 3rd party RPM granted from another company or custom compiled for custom hardware ... or ... with applied specialized patches.

For example, I may have a build target that produces a long list of RPM intended to be placed in an archive and delivered to an update-site ... or even in some legacy situations ... in an ISO for customers to use.

To describe these I might try to write:

pkg:rpm-repo/some.rpm-repo.example.com/[email protected]?archive=zip
pkg:rpm-repo/some.rpm-repo.example.com/[email protected]?archive=tar.gz
pkg:rpm-repo/some.rpm-repo.example.com/[email protected]?archive=iso

Another example:

For some reason I decide to trust a package that comes off a website...

https://slproweb.com/download/Win64OpenSSL_Light-3_0_0.msi

... in the human readable name I see slproweb.com as the entity offering the build. The target is Win64. The given name is OpenSSL. The version is Light-3.0.0. The archive format is msi ... which is NOT at all in line with other packages but let's roll this into the "semi-structured custom package" format support concept.

To pURL-ify this I might say:

pkg:msi/slproweb.com/download#Win64OpenSSL_Light?delim=-&version=3_0_0

... or maybe ...

pkg:bin/slproweb.com/download#Win64OpenSSL_Light?delim=-&version=3_0_0&ext=.msi

... where we can now alternatively reconstruct the original URL or we can build a URL for a mirror following some semi-arbitrary rules. We still don't really know anything about what an MSI is or what's inside it and since its Windows, we don't really care. We're just trying to have content addressing for a ball of data.

...

Common theme? These are some what arbitrary packages of packages intended as a way to track candidate releases of entire update archives that might become private Maven repositories, private RPM update repositories, or other private versions of artifact sites like an internal PyPi or Gem service... or even some opaque custom and hard to understand packaging system for a niche OS.

Questions:

  • would this fit well with #127 since this directory full of packages format would allow you to either pull the repository from some.rpm-repo.example.com ... or express this as a download and cache of that remote repository that you are holding in a local filesystem or other mirror? Or how else would you express that concept? Should you with pURL?
  • should pURL support this kind of "archive" package or directory of files? How? Should there be a few official formats?
  • should pURL support custom package types? How?

hartsock avatar Oct 08 '21 18:10 hartsock

Hi @hartsock, thanks for kicking off this discussion. Summarizing, you're pointing out that there are broadly two types of "packages":

  1. Packages from a package manager that provides a structured identifier.
  2. Packages at an arbitrary download location that conform to a specific content type.

The first is handled by the specific purl types and the main value this brings is consistent mapping of identification (and location) schemes into a common URL format.

The second I believe is the type you're discussing in this issue, is that correct? I believe these should be sufficiently covered by the generic purl type which supports unconstrained name and, optional, version information paired with a download_url. The one gap that your issue touches on that isn't covered by the generic type is communicating what type of content to expect at the download_url, for example, is it an MSI? This could be addressed by adding an optional content_type qualifier to the generic type that allows a media type to be specified.

Do you think this would address the use cases you're considering?

iamwillbar avatar Oct 11 '21 03:10 iamwillbar

@iamwillbar I'm going to have a few off-thread conversations with other folks to see if leveraging generic is good enough for their specific use-cases. I'll circle back here when I've had those side-bars and summarize.

hartsock avatar Oct 11 '21 18:10 hartsock

@hartsock I'm going to close this issue for now, if generic hasn't solved the problem for you then feel free to re-open.

iamwillbar avatar Aug 26 '22 01:08 iamwillbar