What's the use of namespace?
Why can't each type of purl simply have a name that might (or might not) be hierarchical?
The resulting purl would be the same, e.g. github:package-url/purl-spec@244fd47e07d1004f0aed9c, but it would be decomposed into github : package-url/purl-spec @ 244fd47e07d1004f0aed9c.
What is the advantage of having namespaces?
@zvr this just recognizing that these things exist in the wild: for instance, the npm scope is a form of namespace, the group ID in maven too, the user or org in GH or BB are of the same kind, so are the golang "repo path". To your point we could merge indeed the name and namespace in a single component that would be just slash separated. The semantics stay exactly the same. I actually like this as this make the spec even cleaner.
@ashcrow @andrew what do you think? @zvr would you fancy taking a stab at a PR?
Personally I like having namespacing separate. As an example, python-slip is a Fedora package as well as a Debian package. I feel like it's easier to identify it as the same package in different distributions when the namespace is split from the name. In other words one could easily say:
if instance.Name == another.Name {
if instance.Namespace != instance.Namespace {
fmt.Println("Same project, different distributor")
// specific logic...
} else {
fmt.Println("Same project and distribution")
// specific logic
}
}
rather than doing the split yourself to do the check if the initial portion matches. There is a wrench thrown in that not everyone names things the same, but I still feel like it's a nice separation.
I wouldn't freak out if we consider namespace as part of the name.
@ashcrow I agree. This is a great point
I beg to differ. Do you really think that all repositories named "foo" by different users on github are the same or equivalent? Even with the name in your example (which is the same in Debian and Fedora), I can see github:nphilipp/python-slip and github:OpenMandrivaAssociation/python-slip.
@zvr this is correct yet as @ashcrow says being able to sort on names without namespace is a useful approximation in practice. It does not mean same name implies same package, but it helps. Here is some comment I made in reply to @tgamblin in https://github.com/package-url/purl-spec/pull/1#issuecomment-347227409
@tgamblin you wrote:
looks cool! Two questions and kind of one for @andrew: Thanks!
- The original referenced issue says the goal is to have a "unique" identifier for each package, although the spec doesn't seem to dwell on that too much, which is probably good. Do you have ideas on how to reconcile the same package fetched from multiple sources? e.g., the same Python package might exist in
pypi,conda,spack, and system package managers. @andrew: doeslibraries.iodo anything to reconcile the different names?I kinda like to think of these as "mostly" unique, at least unique if a package manager/type provides some unicity within its standard package manager and within a repo/registry of these. Most provide such a guarantee.
As for thing being the same, I would think this is something that a DB of
purls can help with. There is a an amazing graph of relations among the packages: one upstream package may be repackaged in Linux distro, has its source on GH and BB, be bundled or packaged on Conda, spack as RubyGems, etc.For me, I intend to maintain such relationships in https://github.com/nexB/vulnerablecode (e.g. relate a CPE and several
purls together and relate this cluster to a vulnerability; and I capture some relationships in https://github.com/nexB/scancode-toolkit/blob/275-streamline-package-manifests-models/src/packagedcode/models.py#L237 (e.g. this srpm is the sources or this rpm)Finding that two packages are the same is not trivial matter though. I know of two efforts in that domain, focused on Linux mostly:
- oswatershed by @tannewt which is now dormant https://web.archive.org/web/20140531044841/http://oswatershed.org/ ... the most up to date fork is at: https://github.com/pombredanne/open-source-watershed
- @repology https://repology.org/ by @AMDmi3 which is reasonably new and actively developed.
In all cases, this is hard and @AMDmi3 does a rather superb job in this domain with his concept of "meta package"
So in a nutshell, I think namespace is useful to keep as a separate component from the name. Same name sometimes means two package may be the same. But this is a helper only.
Serious name alignment across package types and distros is not trivial as tackled by @AMDmi3 : just check the many deliciously crafted name mapping rules he came up with in https://github.com/repology/repology/blob/master/rules.d/70.global-python.yaml for Python packages and other: https://github.com/repology/repology/blob/master/rules.d/
@zvr does this answers your question to you satisfaction? If so please feel free to close.
Well, I understand the need to sometimes use the last part of the whole namespace+name string. It is similar to a database keeping filenames and wanting to separate directory and filename. On the other hand, a different need might need to separate basename and filename extension, so it's not always clear what the breakdown in components should be.
I would still personally prefer to be able to use the namespace+name as a single element. Maybe we can call it something like "complete name" or "fully qualified name" (similar to FQDN)...
Feel free to close this issue, if you think that no further discussion is needed.
Is it work having another element which houses the namespace + name all as one? Or should we leave that up to the implementations to decide if they expose the combination in a variable?
@ashcrow I am fine either way :)
so we have a namespace and a name alright today as discrete data elements and they can be combined as needed where needed. I sthere any more thing that needs ironing in the spec then? I do not think so.
Many/most component coordinate systems have a concept of namespace or group vs. name or id. So including that concept here IMO is very natural to encompass the naming schemes used by most and honestly I think can cover anything as well. But generally packaging systems do have some form of classification that encompasses a namespace or group, and then a name or identifier under that namespace or group. So defining this as a spec requirement allows for a much wide and flexible adoption with other format naming schemes, and for those that do not have such a concept its simply not used. But most do actually have this concept so if it didn't exist then it would put more emphasis on the format impl to extract namespace and name from the id field, which would also imply additional url encoding and make it harder for generic systems to comprehend these 2 key parts of a coordinate.
Spec today is already split namespace into normalized parts too, so if you have you had some overly complex naming scheme you wanted to express you could still do it with the 2 simple namespace and name parts of a PURL.