guac icon indicating copy to clipboard operation
guac copied to clipboard

[feature] Fetch the `purl` types dynamically

Open antgamdia opened this issue 2 years ago • 9 comments

Is your feature request related to a problem? Please describe.

When importing files, like an SPDX, including unsupported purl types, the ingestion fails with a message like:

unable to ingest doc tree: unable to parse purl pkg:bitnami/[email protected]

This is happening because of:

https://github.com/guacsec/guac/blob/0fec13b490ac0b84c0770b3f0ccbe09cb4504efb/pkg/assembler/helpers/purl.go#L106-L117

Since the purl type is not hardcoded there, then it will yield an error.

It shouldn't be a problem except for the fact that the purl type list is dynamic (for example, we are adding the bitnami type at https://github.com/package-url/purl-spec/pull/231), so hardcoding the possible values in the GUAC code does not seem to scale.

Describe the solution you'd like

Ideally, if there is any way to query all the possible values of a purl type, then it would be great. I don't know how feasible it could be, though. Alternatively, a simple workaround could be an opt-in config flag telling the parser to accept "unknown" package types, maybe. However, I don't know the side effects of this.

Describe alternatives you've considered I have locally added my custom purl type and I have successfully been able to ingest my SPDX, but I'm not sure about the side effects yet.

Additional context N/A

antgamdia avatar Jul 19 '23 14:07 antgamdia

Hi @antgamdia ! Thanks for opening this! I think you've described the situation around PURLs pretty well. Our decision at the time to handle each case was deliberate due to PURLs not necessarily being consistent in the meaning they convey across ecosystems (example: OCI). Thus we wanted to be intentional when a new ecosystem comes up that we handle it.

Would you be able to open that PR to handle that bitnami PURL type and we'll get it merged in?

Your opt-in "handle unknown PURL" flag sounds very tempting but I think we'd want to be careful that it doesn't become the default. I think this would be something i'd like to discuss, since i've had my fair share of PURLs which aren't actually PURLs but the information is useful. Perhaps there's a middleground to handle this to say that this is a heuristic or unhandled purl type.

lumjjb avatar Jul 19 '23 14:07 lumjjb

I would be against adding a flag for "unknown PURLs" as the helper function converts to a model.PkgInputSpec and this is used to determine the mapping into the backend databases. If we allow "unknown" purls to be mapped, we could end up with a bunch of useless data in the database that does not query properly. Rather we should make updates to the helper function as needed and open issues as we encounter new additional types to ensure proper parsing of the new purl types.

pxp928 avatar Jul 19 '23 15:07 pxp928

Our decision at the time to handle each case was deliberate due to PURLs not necessarily being consistent in the meaning they convey across ecosystems (example: OCI). Thus we wanted to be intentional when a new ecosystem comes up that we handle it.

Fair point. Manually handling each new type that may pop up in the ecosystem seems to be the safest way to ensure GUAC is actually compatible.

Your opt-in "handle unknown PURL" flag sounds very tempting but I think we'd want to be careful that it doesn't become the default.

(...) I think this would be something i'd like to discuss, since i've had my fair share of PURLs which aren't actually PURLs but the information is useful. I would be against adding a flag for "unknown PURLs" (...) Rather we should make updates to the helper function as needed and open issues as we encounter new additional types to ensure proper parsing of the new purl types.

Yep, that's my point. Rather than rejecting the information, we could still ingest it and use it somehow.... but this could clearly lead to what you mention: a bunch of "un-queryable" information stored in the DB.

However, I'm not familiar yet with the ontology used in GUAC, but maybe handling unknown package types as another legit source of information could potentially be useful for some organizations. What if the org is generating some internal packages but they are not published nor do they want to create a new purl type? Perhaps it is too specific and it is not really worth it, I kind of understand it.

Another idea, rather than accepting anything unknown, could be an opt-in option to ignore it instead. It is safer than ingesting the unknown, yet useful for people using unsupported types. For instance, in the test I was performing, I had a quite large SPDX with plenty of packages, but, just ~10 were "bitnami" purl type ones: if I had had the ignore option, I would have been able to ingest the SPDX and use it. However, instead, I had to manually look up those offending purls and delete them manually in a text editor in order to get my file ingested by GUAC.

As GUAC continues to be an active project, with people vigilant on any unsupported format, then it's ok... but I'm not that sure that blocking the ingest if something unsupported comes is the best/more scalable way to go :S

A quick win, perhaps, could be just sth like:

	default:
		if _, ok := purl.SupportedTypes[p.Type]; !ok {
			// If the type is supported by the purl library, but not handled above, then
			// we should add a case for it above.
			return nil, fmt.Errorf("unhandled existing PURL type: %q. Please, report it at https://github.com/guacsec/guac/issues/new", p.Type)
		}

		// unhandled types should throw an error so we can make sure to review the
		// implementation of newly introduced PURL types.
		return nil, fmt.Errorf("unhandled unknown PURL type: %q", p.Type)
	}

I could check with purl folks to see if the purl.SupportedTypes map could actually be a thing - currently, it is not (edit: I've filed an issue here: https://github.com/package-url/packageurl-go/issues/59)

Would you be able to open that PR to handle that bitnami PURL type and we'll get it merged in?

Yes, absolutely. As soon as the new PURL type gets officially merged, I'll raise a new PR here.

Thanks for your input and quick response!

antgamdia avatar Jul 20 '23 11:07 antgamdia

Great write up @antgamdia. I like the idea you proposed of returning fmt.Errorf("unhandled existing PURL type: %q. Please, report it at https://github.com/guacsec/guac/issues/new", p.Type). That would make it easy for a user to know which type is unsupported and can be quickly addressed. For now, we could have a guac "supportedTypes" until the upstream purl folks support SupportedTypes.

However, I'm not familiar yet with the ontology used in GUAC, but maybe handling unknown package types as another legit source of information could potentially be useful for some organizations. What if the org is generating some internal packages but they are not published nor do they want to create a new purl type?

So we currently handle that where we use heuristics to create a GUAC-specific purl based on the name and version of the unknown package (purl is not provided in the SBOM). That might be a safer option rather than creating a purl that is not accepted by the community.

https://github.com/guacsec/guac/blob/d438521f8f556fde27c4632a4b97f4f21d4bfd7e/pkg/assembler/helpers/purl.go#L211-L229

Another idea, rather than accepting anything unknown, could be an opt-in option to ignore it instead. It is safer than ingesting the unknown, yet useful for people using unsupported types.

This can also be an option that still outputs error messages of unknown purl types (similar to the error message you provided above) without blocking. We just have to ensure that the user understands that using this option could result in missed information.

pxp928 avatar Jul 20 '23 11:07 pxp928

+1.

Another potential thing we could do on top of that as well is have unknown purls have type unknown+purltype and pkgEqual to a handling of that purl. To point to the default parsed purl type.

On Thu, Jul 20, 2023 at 7:58 AM Parth Patel @.***> wrote:

Great write up @antgamdia https://github.com/antgamdia. I like the idea you proposed of returning fmt.Errorf("unhandled existing PURL type: %q. Please, report it at https://github.com/guacsec/guac/issues/new", p.Type). That would make it easy for a user to know which type is unsupported and can be quickly addressed.

However, I'm not familiar yet with the ontology used in GUAC, but maybe handling unknown package types as another legit source of information could potentially be useful for some organizations. What if the org is generating some internal packages but they are not published nor do they want to create a new purl type?

So we currently handle that where we use heuristics to create a GUAC-specific purl based on the name and version of the unknown package (purl is not provided in the SBOM). That might be a safer option rather than creating a purl that is not accepted by the community.

https://github.com/guacsec/guac/blob/d438521f8f556fde27c4632a4b97f4f21d4bfd7e/pkg/assembler/helpers/purl.go#L211-L229

Another idea, rather than accepting anything unknown, could be an opt-in option to ignore it instead. It is safer than ingesting the unknown, yet useful for people using unsupported types.

This can also be an option that still outputs error messages of unknown purl types (similar to the error message you provided above) without blocking. We just have to ensure that the user understands that using this option could result in missed information.

— Reply to this email directly, view it on GitHub https://github.com/guacsec/guac/issues/1071#issuecomment-1643792444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXLDBTMTDXKOSZW7K4NRVLXREMMVANCNFSM6AAAAAA2QAIOQ4 . You are receiving this because you commented.Message ID: @.***>

lumjjb avatar Jul 20 '23 13:07 lumjjb

Thanks again for your input on this!

Just for the record, the bitnami purl was recently accepted and merged in the spec. Besides, I have created two PRs in the packageurl-go package they provide to add the type and the SupportedType we talked about - don't know when they will get merged, though-. They are https://github.com/package-url/packageurl-go/pull/60 and https://github.com/package-url/packageurl-go/pull/61.

antgamdia avatar Jul 31 '23 10:07 antgamdia

Great! Thanks, @antgamdia. Looking forward to getting this added to guac once it merges upstream.

pxp928 avatar Jul 31 '23 11:07 pxp928

Both PRs were just merged! The upcoming packageurl-go should include both the bitnami purl as well as two exported maps KnownTypes and CandidateTypes holding each possible accepted value in the spec :)

antgamdia avatar Aug 14 '23 16:08 antgamdia

The purl specification refers to known types versus stating allowed types. It articulates the rules for the allowed values for the type attribute with encoding rules. Only expressly supporting those types expressed makes the tools unable to handle non-public package types or future types.

I would ask that you accept any type, based on the parsing rules. Falling back to the generic type or any limited capability but ingest the purl so its relationships can be utilized.

rvanider avatar Oct 11 '23 21:10 rvanider