rs-versions icon indicating copy to clipboard operation
rs-versions copied to clipboard

Versions cannot contain spaces

Open russellbanks opened this issue 1 year ago • 10 comments

I had a user report of an error with parsing a version:

  • https://github.com/russellbanks/Komac/issues/446
  • https://github.com/russellbanks/Komac/issues/447

Both of these are due to the same issue of being unable to parse a version with spaces in. The version in question looks like 22.01 ZS v1.5.5 R2 (See winget-pkgs/manifests/m/mcmilk/7zip-zstd). It works correctly if the spaces are replaced with a delimiter, such as -.

Versioning::from_str("22.01 ZS v1.5.5 R2").unwrap(); // IllegalVersioning("22.01 ZS v1.5.5 R2")

russellbanks avatar Feb 03 '24 20:02 russellbanks

Spaces! My lord what has the world come to.

Okay I've looked at the associated issues on your repository, as well as the 7zip releases page. It seems that the title of the releases uses spaces, but the git tag has proper - deliminations. I gather your tool is pulling from their releases via the title, then? Could you fill me in with some of the background information about how these version numbers are gathered and how they're used for further processing?

Accepting spaces in version numbers (even in the Mess type) may add some serious complexity, especially since the parsing behaviour of all three subtypes (SemVer, Version, and Mess) propagate up to the parent, Versioning. If I allow spaces, then the line between what's a version and what isn't (given say a line of words) becomes very blurry.

fosskers avatar Feb 03 '24 23:02 fosskers

My tool allows users to add or update applications to winget (Microsoft's package manager for Windows). Each application has some metadata files and in those there is a package version (Example). The package versions at winget-pkgs/manifests/m/mcmilk/7zip-zstd are already pre-existing versions for 7zip-zstd in winget. This is where my tool gets the versions from (and also a new version from the user if they're adding a new version). Because Microsoft has very little limitations on what is a valid version, my tool needs to be able to parse and order any version of any app that is submitted to winget so that I can know which metadata file is the latest to build upon.

In Microsoft's schema for this, they actually just have a regex for validating the version:

"PackageVersion": {
  "type": "string",
  "pattern": "^[^\\\\/:\\*\\?\"<>\\|\\x01-\\x1f]+$",
  "maxLength": 128,
  "description": "The package version"
},

In short, the regex just says it can't contain some specific control characters and beyond that, what constitutes a version can be anything, and winget will attempt to order it regardless.

russellbanks avatar Feb 03 '24 23:02 russellbanks

The following test passes:

    #[test]
    fn mess_7zip() {
        cmp_messes("22.01-ZS-v1.5.5-R2", "22.01-ZS-v1.5.6-R2");
    }

    fn cmp_messes(a: &str, b: &str) {
        let x = Mess::new(a).unwrap();
        let y = Mess::new(b).unwrap();

        assert!(x < y, "{} < {}", x, y);
    }

But so does:

        cmp_messes("22.01-ZS-v1.5.5-R2", "24.02-ZS-v1.6.0");

Since even with the - injected in, these versions are deep into lawlessness seemingly mixing calver, semver, and other noise, and so must be parsed as a Mess. Note that here, the calver section basically dominates the comparison, but this may be sufficient for your purposes for comparing at least this particular edge case.

I think it's best to remain firm that whitespaces shouldn't be accepted in version numbers, even mostly lawless ones as we see here. If they are present, I think it's reasonable to do some preprocessing in downstream code before parsing.

fosskers avatar Feb 15 '24 00:02 fosskers

Preprocessing is unfeasible in my case as I take in a version straight from a clap parameter and serialize it into Yaml. Even if I were to do some changes to the version, I need it to be the exact same as what the user inputted to what value is serialized in the end. If it were more feasible, it would also add a lot of unnecessary complexity just to workaround this issue of mess not working with spaces.

I'd really appreciate it if you could add support for spaces - it would arguably make this library more robust with the intention of being able to parse any messy version and still be able to compare it.

russellbanks avatar Feb 15 '24 08:02 russellbanks

Let's discuss this. Were I to support spaces in Mess, there are two breaking changes that come to mind:

  1. The addition of a variant in the Sep enum. These kinds of changes, while technically breaking, almost certainly affect no one but me. I doubt anyone is actually pattern matching on Sep anywhere in real code.
  2. The behaviour of the exposed Mess::parse nom parser. Since this could technically be plugged into any downstream parser combination. A sentence like Hello 1.2.3 Rust that expected the structure word <> version <> word would now fail, with the Rust now being parsed as part of the version. Similar to the above: is anybody actually doing this anywhere? I don't know, but it might not be impossible to individually check: https://crates.io/crates/versions/reverse_dependencies

fosskers avatar Feb 15 '24 09:02 fosskers

  1. The addition of a variant in the Sep enum.

I do think it's unlikely that anyone is matching over Sep. If anyone were, they could either use _ when matching or add the space seperator to their match. However, I've checked the 18 packages in the reverse dependencies of this crate on crates.io by searching the code on their respective GitHub repositories for versions:: and none of them import or match Sep.

  1. The behaviour of the exposed Mess::parse nom parser.

Again, from checking the reverse dependencies, no one appears to be using this. The usages I saw were mostly Versioning, SemVer, and a couple usages of Chunk. Granted these are only ones that are on crates.io, but although this may technically be breaking, I can't see it having an impact or even affecting any code bases.

russellbanks avatar Feb 15 '24 16:02 russellbanks

I've since had another user report an instance of an app having a version with spaces in it: 7.3 2022-02-28 r5338 (sf-7.3-1) - https://github.com/russellbanks/Komac/issues/447#issuecomment-1952944709

russellbanks avatar Feb 19 '24 18:02 russellbanks

Thanks for the follow-up. I do intend to solve this as soon as I'm able.

fosskers avatar Feb 20 '24 00:02 fosskers

Hello, I'm running into this as well. I'm using Versioning to compare PostgreSQL version numbers, to determine if an upgrade is needed.

The output looks a bit like this:

$ psql postgres --pset format=csv -c 'SHOW server_version'
server_version
14.12 (Homebrew)

Here, Homebrew is a suffix that can basically be disregarded. (I probably wouldn't want 14.12 (Homebrew) to compare greater or less than 14.12 (NixOS), for instance.)

I can work around this by removing the suffix, but I thought I'd make a comment here.

It might be nice to have a "permissive parser" which ignores or otherwise segments out extra information like this.

9999years avatar Jun 21 '24 23:06 9999years

Hi there. In this case I'd recommend a preemptive call to https://doc.rust-lang.org/std/primitive.str.html#method.split_once to remove anything past the first whitespace, and then call Versioning::new.

fosskers avatar Jun 22 '24 01:06 fosskers