syft icon indicating copy to clipboard operation
syft copied to clipboard

feat: add BeamVM Hex support

Open cpendery opened this issue 2 years ago • 7 comments

📝 Description

Adds support for parse rebar.lock and mix.lock files to add cataloguing support for Elixir & Erlang projects that use the Hex package manager. Placed under the beam cataloger.

Closes: https://github.com/anchore/syft/issues/1071

cpendery avatar Jun 29 '22 12:06 cpendery

Thanks for another great contribution @cpendery! I do think having beam as the language is going to be confusing. My suggestion would be to break it up into a separate erlang cataloger for rebar.lock and elixir cataloger for mix.lock, and both having the hex package type. I'm not sure if sharing the same package type between two languages will cause any other problems though.

westonsteimel avatar Jul 01 '22 13:07 westonsteimel

Thanks for another great contribution @cpendery! I do think having beam as the language is going to be confusing. My suggestion would be to break it up into a separate erlang cataloger for rebar.lock and elixir cataloger for mix.lock, and both having the hex package type. I'm not sure if sharing the same package type between two languages will cause any other problems though.

With splitting the languages into Elixir and Erlang, when we see a purl for the Hex pkg:hex/ package manager (supports both languages), I wouldn't know what language to resolve it to. I get using the ecosystem as a language is confusing. Any ideas @westonsteimel

cpendery avatar Jul 01 '22 14:07 cpendery

With splitting the languages into Elixir and Erlang, when we see a purl for the Hex pkg:hex/ package manager (supports both languages), I wouldn't know what language to resolve it to. I get using the ecosystem as a language is confusing. Any ideas @westonsteimel

Hmm, can we encode the language as a parameter to the package url when syft creates it? @spiffcs, any thoughts? Do we ever need to decode language from a package url (perhaps this is something that happens from one of the non syft-native formats)?

westonsteimel avatar Jul 01 '22 19:07 westonsteimel

Hmm, can we encode the language as a parameter to the package url when syft creates it? @spiffcs, any thoughts? Do we ever need to decode language from a package url (perhaps this is something that happens from one of the non syft-native formats)?

The purl specification doesn't have a language property for the Hex purl type, unfortunately. However, it seems like an advantageous thing to add, so maybe @pombredanne would be up for adding this! :smile:

seabass-labrax avatar Jul 02 '22 15:07 seabass-labrax

It could be useful for Conan (C/C++) and Cocoapods (Objective-C/Swift) since both support dual languages like Hex. I made a PR below just as a starting point for the discussion in the purl-spec repo, since it may be a better place to continue talking

cpendery avatar Jul 02 '22 15:07 cpendery

With splitting the languages into Elixir and Erlang, when we see a purl for the Hex pkg:hex/ package manager (supports both languages), I wouldn't know what language to resolve it to. I get using the ecosystem as a language is confusing. Any ideas @westonsteimel

Hmm, can we encode the language as a parameter to the package url when syft creates it? @spiffcs, any thoughts? Do we ever need to decode language from a package url (perhaps this is something that happens from one of the non syft-native formats)?

After forming the PURL we have a decoding function called LanguageFromPURL: https://github.com/anchore/syft/search?q=LanguageFromPURL

This would lead to the issue @cpendery brings up where we don't have enough information at that point to assign a language. If his PR is accepted then we can make the split in the specification itself and no longer encounter this issue. Because of this design choice, catalogers are loosely bound to support on the PURL side.

spiffcs avatar Jul 05 '22 14:07 spiffcs

@cpendery this looks really good. I'll wait on merging or updating in anyway until we hear back on the PR you made for the purl-spec.

spiffcs avatar Jul 05 '22 14:07 spiffcs

TODO: update cataloger to new generic cataloger pattern

spiffcs avatar Dec 20 '22 03:12 spiffcs

I think we can leave language as blank / unknown in these circumstances -- the cataloger is more valuable than resolving the language from the pURL IMHO.

I can help rebase what is here and update the patterns some based on the drift.

wagoodman avatar Jan 12 '23 14:01 wagoodman

The main changes I made were:

  • Split the cataloger into erlang and elixir. The conflict for parsing the language from the pURL has been kicked down the road.
  • Split the HexMetadata into MixLockMetadata and RebarLockMetadata. Why do this since they contain essentially the same information? If we intend to support extracting more information from these sources, they aren't exactly the same, so we want future room to grow here without having to make a breaking change. The package type may map to many different metadatas, and the metadatas should most closely represent the source in which you are parsing from --this is the general rule of thumb.
  • Updated the cataloger to use the new generic cataloger and surrounding patterns.

I'll push shortly, and I think this will be good to go!

wagoodman avatar Jan 12 '23 15:01 wagoodman