repology-updater icon indicating copy to clipboard operation
repology-updater copied to clipboard

[discussion] unified scheme for snapshot versions

Open AMDmi3 opened this issue 6 years ago • 32 comments

TL;DR: see Summary below

So, we have support for normal versions, and we now also have a special support for prerelease versions. However, we still have to ignore a lot of packages most of which are snapshots. Sometimes snapshots are necessary evil and cannot be avoided. For example, if the release has fatal bug, or when upstream is dead, but there are useful commits in the master branch.

Now I wonder if repology can improve the situation by suggesting some kind of unified unambiguous snapshot format, so snapshot versions from different repos COULD be comparable.

The ideas on format:

  • It must be post-known-version format. E.g. if the latest known version is 0.4.7, the snapshot must be 0.4.7something, not 0.4.8something, because there should be no guessing on what the next version would be.
  • It must contain a date. Git commit hashes are not monotonic so are meaningless in version numbers, revision numbers have gone with svn, and are difficult to count for DVCSes.
    • The date should be of some fixed monotonic format. YYYYMMDD is an obvious choice.
    • The date should refer to commit date, not package creation date (otherwise, it's meaningless to compare)
    • Since the date does not contain time, it should be in UTC to avoid misinterpretation due to timezone difference
  • It must support the case where there's no past version at all.

Well, I don't see many choices on a format here, it's obviously 1.2.3somethingYYYYMMDD (or somethingYYYYMMDD when there's no past version). From repology point of view, it's the same as 1.2.3.somethingYYYYMMDD, so distros may use additional dot on their discretion.

So, we have to decide what to use as something.

  • We can't use most of currently used keywords (git, svn, bzr) as they are ambiguous and often used with "pre" meaning.
  • The only unambiguous keywords with "post" meaning are currently patch and post (I've just discovered the latter, and added support to libversion), but they are sometimes used upstream (hdf5, some python ports)

So, either we have to invent a new keyword, which has apparent "post" meaning and is not used upstream, or we could use one of post or patch ignoring their use upstream (which is not that wide). Inventing a keyword seem to be preferable. So, the ideas?

  • postsnap
  • post* (e.g. allow postgit, postsvn and whatever which begins with post, and compare them equally)
  • plus

Additional thoughts:

  • I do not like the idea of forcing anything repology-specific on repos at all - it doesn't look right, it creates tension and it never works to 100%. Instead, I'd gladly leave all snapshots ignored or fix them with rules on a per-project on-demand basis.
  • However, I still think that suggesting and favoring explicit pre and post suffixes to git/svn/hg/cvs would be beneficial on a global scale.
  • Not all distros will be able to properly support this schema anyway. Some don't like letters and tend to avoid them. Some allow letters, but compare them to numbers in a fixed way, e.g. 1.2preXXX and 1.2postXXX would both be lesser to 1.2.

Summary

When packaging snapshots, let's

  • use last known official version (not the supposed next one) as a base
  • add explicit post keyword after it (anything allowed after post, e.g. postgit, postsvn)
  • use UTC date of a snapshotted commit (not the date of packaging!)

The version of snapshot which comes after official 4.7 version may thus look like

  • 4.7postgit20170928 or
  • 4.7.postgit20170928 or
  • 4.7post20170928

See how it's better than:

  • 4.7git20170928 (it is not known whether snapshot is taken before or after 4.7)
  • 4.7git1234f6a (commit hashes are meaningless in versions, as they are not monotonic; however you may still append it: 4.7postgit20170928.1234f6a and I think we can still make it comparable in a sane way)
  • 4.8git20170928, 4.8pre20160928 (you are guessing which the next version would be, and you may be mistaken. For instance, it may be 4.7.1 which would make the version go backwards)

Note that this schema is not something synthetic and new, it's just a refinement of widely used VERSIONwordDATE schema which provides an explicit and unambiguous information on a snapshot which was packaged. As a side affect, it makes it possible for repology to compare these snapshots.

AMDmi3 avatar Sep 28 '17 16:09 AMDmi3

Nice write up, seems like a proper solution.

I have one comment regarding "guessing" of a next version. Often, 4.8git20170928 is "guessed" based on source code where the author has changed it from the last release 4.7 and it is reflected using --version parameter or displayed when you run it. I agree that there is still no guarantee that a next version will be called 4.8 but there is a hope that it will be not below that version at least. 4.7post20170928 is more universal and straightforward solution for this problem although the "official" version might be higher

blshkv avatar Sep 29 '17 00:09 blshkv

Well, having next version explicitely defined in the upstream code/documentation justifies using 'pre' somewhat, but there still is no guarantee that another version will not be released instead, messing everything up. "Post" way is bulletproof though.

AMDmi3 avatar Sep 29 '17 06:09 AMDmi3

I've just ran into post suffix used in actual official version:

https://pypi.python.org/pypi/flake8-builtins/1.0.post0

Which makes me think that the only option is really verbose unique suffix such as V.V.VpostsnapshotYYYYMMDD

AMDmi3 avatar Dec 20 '17 11:12 AMDmi3

I think you should take any standard version scheme and normalise all software to it. Software authors have way too many different creative ideas how to call their releases.

blshkv avatar Dec 20 '17 15:12 blshkv

It is not possible.

AMDmi3 avatar Dec 20 '17 16:12 AMDmi3

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

davidak avatar Apr 29 '18 12:04 davidak

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

blshkv avatar Apr 29 '18 13:04 blshkv

these two extra chars do no bring any value

It brings the value that it is more readable to humans #accessibility

Also, i just randomly found this XKCD comic about ISO 8601 again.

iso_8601

davidak avatar Apr 29 '18 21:04 davidak

Well yeah, but i feel like you didn't read my reasons. We are talking about version numbers, not about date standards

blshkv avatar Apr 30 '18 05:04 blshkv

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

I doesn't have to be readable (though I don't see any readability problems with YYYYMMDD), it must be simple, unambiguous and close to schemes which are already widely used.

repology=> select count(*) from packages where version ~ '20[0-9]{2}-[0-9]{2}-[0-9]{2}';                                                                                                                                                                             
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
  1859                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                     
repology=> select count(*) from packages where version ~ '20[0-9]{6}';                                                                                                                                                                                               
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
 66379                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              

Also, some repositories do not support dashes in versions.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

Yes, because when the actual version is released, it would automatically be ordered after somethingYYYYMMDD, but not YYYYMMDD, and for the uniformity sake.

Actually, I've just found out that from libversion perspective something1 is less than 0something1, while I'd expect them to be equal. May be related to repology/libversion#14, but anyway we may want to require 0something to make it miscomparison-proof and less ambigous. Or no, depending on how we and others do/want to handle versions like alpha1 (see below).

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

These cases should not be separated, as the proposed snapshot scheme must coexist with past and future versions. Any scheme without prefix will break as soon as first official version is released. So the proposal is to treat all snapshots based on some upstream version, 0 if there isn't one. Actually even that will break if upstream releases e.g. alpha1, unless something is treated very specially (everywhere) which I'd like to avoid, to make the scheme usable with any generic version comparison algorithm, even not as elaborate as libversion.

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

That's a very good question. Naive answer would be YYYYMMDD.1, but that can no longer be compared across different repositories. It seems to me that it can't be solved with the scheme at all, as any local suffix will break cross-repository comparison, and complicating the scheme by adding more time resolution would hinder its adoption.

Actually, most repositories have local package revisions which could be used for this purpose. I guess the scheme should suggest using revisions, while libversion could handle snapshots specially and ignore everything past the date. This is OK, since the special handling would only be required in libversion, all local algorithms will still be OK with handling suffixes normally.

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

Not yet. I'm sure this topic will come up when repology is used by more people.

AMDmi3 avatar May 05 '18 12:05 AMDmi3

Returning to this, alternative solution would be for individual repositories to convey information on that they are packaging a snapshot. As soon as we have this flag and a snapshot date, we could compare snapshots specially by comparing dates instead of versions.

It could be further improved:

  • Introduce a grace period, and don't consider snapshots older than a freshest one by less than a, say, some weeks as outdated. This would prevent Repology from encouraging races and too frequent updates (which IMO is bad)
    • May also take this period (or introduce another, longer one) from current time instead of the latest snapshot time, to encourage eventual infrequent snapshot updates (e.g. to latest commit or to the latest existing snapshot)
  • Allow official releases with accurate date available to outdate all snapshots immediately.

After repology/repology-rules#20 is done (not even started yet), we'll have all snapshots which use date version marked up, so we can extract this information from them. However if any repository wishes to convey this information directly, it's most welcome and could be used right away.

Repology would need, roughly,

  • an indication that the package is a snapshot. It itself is enough to not introduce fake versions and upset users.
  • a snapshot date (since we're going to use grace period, it doesn't need to be accurate and YYYYMMDD is quite enough with any time zone; however need to note that some repos, namely openSUSE, provide snapshot versions with second accuracy (ISO8601 time format or epoch seconds))
  • last official version before snapshot

There are multiple ways to convey this data. The simpliest one would be to just use date suffix to the version (1.2.3.20190101) like most repositories already do (however it needs to be used consistently) and introduce a snapshot flag. This would be enough for Repology to handle snapshots consistently.

AMDmi3 avatar Aug 21 '19 19:08 AMDmi3

Gentoo has a very clear policy: https://wiki.gentoo.org/wiki/Project:ComRel/Developer_Handbook/Ebuild_policy

foo-x.y_preYYYYMMDD.ebuild
foo-x.y_pYYYYMMDD.ebuild

BUT ;-) there is an exception when the upstream did not release any version and x.y is not specified. In this case, the foo-YYYYMMDD.ebuild is used. I could not find any place for the "snapshot" flag.

So as a generic rule, you can search for the suffix YYYYMMDD.ebuild

blshkv avatar Aug 22 '19 02:08 blshkv

Gentoo's policy is no better than other repositories using random suffixes - it mixes up with upstream versions using p with snapshots, it allows pre with nonexisting upstream versions, and YYYYMMDD is indistinguishable from upstream versions looking the same way.

AMDmi3 avatar Aug 22 '19 13:08 AMDmi3

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

And also - is a separator equal to . in RPM, it will split the version in not needed places and lead to part-by-part comparing of components of the date and what goes after it instead of working with the whole date.

mikhailnov avatar Jan 14 '20 07:01 mikhailnov

FYI, in ALT we promote the following versioning scheme of git snapshots which is based on the idea implemented in https://git.savannah.gnu.org/cgit/gnulib.git/plain/build-aux/git-version-gen (which in turn is used in many projects): If "git describe --abbrev=1" of the upstream commit is VERSION-NUMBER-gHASH, then the package version has to be VERSION.0.NUMBER.HASH . Simples!

ldv-alt avatar Aug 19 '20 22:08 ldv-alt

Apologies per necro-bumping!

I will mark myself here, because we at Nixpkgs are struggling at a similar problem. The format I am using is something like x.y.z+unstable=YYYY-MM-DD, however it is still in "brainstorm phase".

(late edits to reflect the current state - thanks @davidak for the reminder)

AndersonTorres avatar Sep 07 '21 22:09 AndersonTorres

@ldv-alt there's a rule back from 2018 which marks that specific scheme as incorrect. Thankfully that scheme hasn't gained wide adoption, as it's horrible in all aspects: not separating upstream and snapshot parts, needlessly long and uses commit hashes. Also violates RPM version policy. Actually, the whole sisyphus is currently pessimized for providing intolerable amount of fake versions (apart from snapshots, for which there's also nothing close to a single format).

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

AMDmi3 avatar Sep 08 '21 14:09 AMDmi3

I think there is still no decision which format should be used in NixOS. It would be great if this issue results in a recommendation.

davidak avatar Sep 08 '21 17:09 davidak

The recommendation is in the issue body.

AMDmi3 avatar Sep 08 '21 17:09 AMDmi3

Also violates RPM version policy.

@AMDmi3 Please elaborate.

ldv-alt avatar Sep 08 '21 22:09 ldv-alt

09.09.2021 01:41, Dmitry V. Levin пишет:

Also violates RPM version policy.

@AMDmi3 https://github.com/AMDmi3 Please elaborate.

+1

mikhailnov avatar Sep 09 '21 07:09 mikhailnov

@mikhailnov @ldv-alt

It is mentioned in ALT own docs: https://www.altlinux.org/Spec#Промежуточные_upstream-релизы

It was mentioned in Fedora packaging guidelines, but it turns out it's now thankfully deprecated. https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_traditional_versioning_with_part_of_the_upstream_version_information_in_the_release_field https://web.archive.org/web/20181211075036/https://fedoraproject.org/wiki/Packaging:Versioning#Prerelease_versions

AMDmi3 avatar Sep 09 '21 12:09 AMDmi3

@mikhailnov @ldv-alt

It is mentioned in ALT own docs: https://www.altlinux.org/Spec#Промежуточные_upstream-релизы

I'm sorry to correct you, but the wiki page you're referencing is not a policy, let alone an RPM policy.

It was mentioned in Fedora packaging guidelines, but it turns out it's now thankfully deprecated. https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_traditional_versioning_with_part_of_the_upstream_version_information_in_the_release_field https://web.archive.org/web/20181211075036/https://fedoraproject.org/wiki/Packaging:Versioning#Prerelease_versions

I'm sorry to correct you, but the Fedora document you're referencing is not an RPM policy.

Anyway, RPM permits the kind of versioning I recommend for use in case of git snapshots, and ALT packaging policies have nothing against it.

Like it or not, but the versioning scheme I recommend for git snapshots has its benefits and its users. You opposition to this scheme is clear, but I'm respectfully disagree. Anyway, it's up to distros to choose their packaging policies, and ALT has chosen the scheme you don't like. Let's agree to disagree on this subject.

ldv-alt avatar Sep 09 '21 13:09 ldv-alt

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

AMDmi3 avatar Sep 09 '21 15:09 AMDmi3

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

I am formulating a RFC to the NixOS community/organization. Until then, the mess will be there.

AndersonTorres avatar Sep 09 '21 21:09 AndersonTorres

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes, I do not agree that they cannot be meaningfully compared with upstream versions, and you do not compare different snapshots between each other anyway.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

ldv-alt avatar Sep 09 '21 22:09 ldv-alt

Adding YYYY-MM-DD to the version requires manual work.

Here is an example of how git snapshot can be packaged: https://abf.io/import/gimagereader/blob/a83f21be3b/gimagereader.spec

%define commit d3cdd00b3e848867d95db28354afc41814d5dd0c
%define commit_short %(echo %{commit} | head -c 5)
Version:	3.3.1
Release:	2.git%{commit_short}.3
Source0:	https://github.com/manisandro/gImageReader/archive/%{commit}.tar.gz?/gImageReader-%{commit}.tar.gz

Release tag consists of 3 parts. When upgrading to a new git snapshot, the first number is increased, when rebuilding an existing snapshot, the last number is increased.

As a package maintainer, I just go to github or another place, study commits history, then copy the commit hash, change it in the spec file, then run spectool -g *.spec && rm -fv .abf.yml && abf put and that's all, I have neither wish nor time to maintain a correct date of the git commit from which the snapshot was build. I would probably maintain it, but it will not help actually anyhow to neither users nor projects like repology (or am I wrong, will it help?).

I think other maintainers have a similar way of thinking and that is why I would not expect a wide adoption of naming schemes which require additional useless work like tracking date.

mikhailnov avatar Sep 10 '21 05:09 mikhailnov

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes

No, they are not. Unlike any other snapshot schemes I've seen, they are completely indistinguishable. There is not a single property which can be reliably used to tell them from official versions.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

Yes.

Adding YYYY-MM-DD to the version requires manual work.

I've never required to add YYYY-MM-DD to the version.

Here is an example of how git snapshot can be packaged:

There is no problem with this specific case at all, as a) It's based upon official version b) It clearly distinguishable as a snapshot (by presence of git in Release)

For instance, Repology can (and does) safely treat it as the unmodified version, which won't generate nonexisting release, will be marked newest/outdated correctly and can be compared to NVD with release granularity. The lack of date prevents it from being compared with higher granularity, but we don't to that anyway and I don't think we should and will.

However, it is still pessimized in a way that this version will not be treated as a new if it only comes from an RPM distro. Because the above mentioned "not policies" are widely used, there's no telling that the snapshot is based upon a real release, or a fake "next" release as the not policies suggest. There's no way to tell that by Release starting with 0 either, because these are not policies.

AMDmi3 avatar Sep 10 '21 11:09 AMDmi3

Ah, thanks, I think I understood, so if ALT's version-release was VERSION.0.NUMBER.gitHASH instead of VERSION.0.NUMBER.HASH, it would be recognizable as a git snapshot.

mikhailnov avatar Sep 10 '21 11:09 mikhailnov

While the other problems with it remain, yes, at least it would be possible to reliably tell that it's not an upstream version. It won't allow to tell it from snapshots which can be compared to upstream though.

AMDmi3 avatar Sep 10 '21 11:09 AMDmi3

On Fri, Sep 10, 2021 at 04:24:13AM -0700, Dmitry Marakasov wrote:

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes

No, they are not. Unlike any other snapshot schemes I've seen, they are completely indistinguishable. There is not a single property which can be reliably used to tell them from official versions.

These versions are upstream versions followed by .0.distance.digest suffix where distance is a decimal number and digest consists of at least 4 hexadecimal digits, so they are clearly recognizable.

For example:

$ rpmquery elfutils elfutils-0.185.0.54.b561-alt1.x86_64 $ rpmquery --qf '%{version}\n' elfutils
| sed -E 's/^(.+).0.([[:digit:]]+).([[:xdigit:]]{4,})$/\1\t\2\t\3/' 0.185 54 b561

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

Yes.

Unfortunately, such a blanket ban approach makes the whole repology.org untrustworthy.

ldv-alt avatar Sep 10 '21 13:09 ldv-alt

These versions are upstream versions followed by .0.distance.digest suffix where distance is a decimal number and digest consists of at least 4 hexadecimal digits, so they are clearly recognizable.

No they are not. As can be seen by the link already given above, the probability is quite high for these hexadecimal digits to only consist of decimal digits, making a snapshot indistinguishable from a legal dot-separated numeric version:

1.18.0.27.0405
4.8.0.0.10.1157
2.6.4.0.88.9801
1.0.1.0.8.5087
2.13.0.5.8107
0.12.0.3.4174

Versions like these are used in the wild, in case you wonder.

In some other cases, even if a hash contains [a-f], it's still indistinguishable from legal prerelease or letter-suffixed version:

0.185.0.54.b561
4.06.0.7.100b
4.8.0.7.b352

Unfortunately, such a blanket ban approach makes the whole repology.org untrustworthy.

The very first thing Repology must do to be trustworthy is to prevent garbage from a misbehaving repository to be reported as a new upstream version to all other maintainers, and that we do. As I've already mentioned though, the discussed scheme is not the only and not the main cause for the ban - the amount of random made up versions from Sisyphus is, as the repository is the worst by the number of ignore rules I've had to add and maintain

% grep -R sisyphus repology-rules/900.version-fixes | wc -l
     301

by the number of known incorrect versions

repology=> select repo, count(distinct effname) from packages where versionclass = INCORRECT() group by repo order by count desc limit 10;
       repo       | count 
------------------+-------
 alt_p9           |    90
 alt_p10          |    86
 altsisyphus      |    83
 funtoo_1.4       |    69
 nix_unstable     |    68
 raspbian_testing |    67
 nix_stable       |    67
 gentoo           |    66
 raspbian_stable  |    65
 debian_unstable  |    61
(10 rows)

and by the number of complaints, e.g. cases which actually affect users:

repology=> select count(*) from reports where comment ilike '%sisyphus%';
 count 
-------
    47
(1 row)

So please don't mention untrustworthiness.

AMDmi3 avatar Sep 10 '21 17:09 AMDmi3