anitya icon indicating copy to clipboard operation
anitya copied to clipboard

Project naming guidelines

Open akien-mga opened this issue 8 years ago • 4 comments

As discussed in #356, we need to define some guidelines for naming projects on Anitya.

There is a consensus that it should be the upstream name, and that distro-specific naming variations are meant to be handled via mappings. It however brings up some important questions for which we'll need some pointers in the guidelines:

How to determine the upstream name?

  • The first point is what the scope of the upstream name is. Should it only serve to identify a given project (both to users via the search engine and to the database as unique (project name, homepage) tuples), or is it also used by Anitya in some situations?
  • Then I guess the upstream name we choose should answer the question "How does upstream name their own project?", with some practical nuances, especially regarding the capitalization and use of spaces:
  • Should we enforce lowercase names (taken e.g. from the upstream URL, version control repo or main binary), or accept capitalized names if used by upstream (e.g. supertuxkart vs SuperTuxKart).
  • Same question for spaces (e.g. extremetuxracer vs Extreme Tux Racer).
  • Whatever we choose for the above, I think we should enforce it, so that we can't have both extremetuxracer and Extreme Tux Racer in the database). My preference would go to enforcing lowercase "URL-like" names, but this is open to discussion.

Ideally we should try to review various projects and see how we would parse the information they give (I mean "parse" mentally, not automatically) to decide what upstream name is the right one, and thus what naming guidelines we should enforce.

How to differentiate different upstream projects with the same upstream name?

Anitya allows setting the same upstream name for two different projects if the homepage differs, so that should cover most use cases.

It however implies that the homepage should be careful chosen to specifically identify the upstream project whose version we want to track, which might not always be straightforward.

In particular, for bindings we can end up in tricky situations, such as for msgpack: the cpp, java, php and ruby bindings of the official upstream msgpack projects are all maintained in the same GitHub repo, and their respective releases are versioned separately are differentiated by their tag: https://github.com/msgpack/msgpack/releases The homepage for all of them would a priori be http://msgpack.org, so if we define the upstream name as msgpack according to both the homepage and GitHub repository name, we're stuck.

Note that the above is just some food for thought, I'd be glad to get input from various developers and users before we can write down proper guidelines.

akien-mga avatar Oct 28 '16 11:10 akien-mga

The first point is what the scope of the upstream name is. Should it only serve to identify a given project, or is it also used by Anitya in some situations?

I think answering this one answers quite a bit of the questions you're raising. The project name is used by anitya when trying to find the release information, for most backends that means when trying to find the latest tarball, for other (such as pypi, pear, npm..) where the forge are offering an API, it should be the name of used on that forge.

The point you are raising is definitely of interest and is clearly one that has been/is biting us from the start. I don't think we're doing too bad but maybe we could do better, at least with documentation :)

pypingou avatar Oct 28 '16 12:10 pypingou

Would probably be worth having an ability to reduce results down to those in a given backend/ecosystem by name in the API.

For instance, I want to make a suggestion to metacpan.org that they use anitya's data to extend the website to show users what various upstream distributions are referred to as on different vendors, to create streamlined installation instructions.

But for that to work, you'd need a way to ask Antiya what the vendor names are for "CPAN Foo"

Currently the closest thing here is the /<distro>/<packagename> API endpoint, but that's limited as it requires metacpan to already know what its called, which is sub-optimal.

kentfredric avatar Nov 27 '16 06:11 kentfredric

I think there is an idea for this on the ticket queue already, but if not, that sounds like a perfectly reasonable request which definitely deserve having its own ticket :)

pypingou avatar Nov 28 '16 11:11 pypingou

For me, this background was important data for consideration in regards to "naming".

Because unless you do have a strict adherence somewhere in your data to both identify:

  1. The upstream ecosystem
  2. The name of the package upstream

Then you limit the ability to even have such a feature.

Hence, settling on that implementation is a precursor to having a bug/implementation that offers the related benefit.

kentfredric avatar Nov 28 '16 13:11 kentfredric