purl-spec
purl-spec copied to clipboard
feat: add language qualitifer for multi language package managers
📝 Description
This adds a language qualifier to the purls for package managers where the packages can be written in multiple different source codes in order to allow for the identification of the package's language
Related to:
- https://github.com/anchore/syft/pull/1083
- https://github.com/anchore/syft/pull/1081
- https://github.com/anchore/syft/pull/1073
@stevespringett for input
Also see #168 appearing to introduce arbitrary qualifiers for conan
.
Can there ever be a package which contains source code written in two or more languages?
Yes. Occurs in Maven Central where a project may have a combination of Java, Scala, and Kotlin source languages. These projects will also have XML or groovy depending on the build system used (Maven uses XML, Gradle uses groovy). Maven projects can also include "resources" which will typically consist of key/value properties, xml and json configuration, but technically, any language can be included there. Any application that has a dependency on Mozilla Rhino for example, would also typically include Javascript in one or more of the dependencies as well. Same thing for any of the other library that allows the execution of non-native languages on the JVM.
Will also occur with npm where both Javascript and Typescript are used. npm also allows arbitrary scripts to run when installing by default. These scripts could be bash, powershell, etc, which would be yet another language.
One additional point on Maven, may likely be true for others as well, if the Maven type is war
(web archive), then you can expect either Java, Scala, or Kotlin source languages for the backend, and HTML, CSS, and Javascript for the frontend, all in a single package.
- Looks like there's a fixed set of languages suggested for these package managers, is there an authoritative source for those? (For futureproofing purposes)
I don't know of any fixed set for these package managers, so it may be better to have a naming convention for this fields so the spec doesn't need to change if support of a new language gets added. I think lowercased snake case where characters should not require html encoding (thinking of cpp
vs c%2b%2b
) which seems to follow purl-spec convention. This would yield: objective_c
, c
, cpp
, elixir
, swift
, erlang
.