crates.io icon indicating copy to clipboard operation
crates.io copied to clipboard

Add `semver_ord(num)` SQL function and `version.semver_ord` column

Open Turbo87 opened this issue 10 months ago • 6 comments

We currently rely on the Rust semver crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need.

Ideally, we would be using https://pgxn.org/dist/semver/ or https://github.com/pgcentralfoundation/pgrx to solve this, but unfortunately most PostgreSQL hosters don't allow/support custom extensions, and it would also make local development a bit more challenging.

As a workaround, this PR implements a semver_ord(num) pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty.

Update: After discussion in this PR we changed the maximum number of prerelease specifiers to 10, but the remaining specifiers are now appended to the array as a string, to at least ensure uniqueness and a stable sort order.

The database migration in this commit also adds a new semver_ord column to the versions table, and an on-insert trigger function that automatically derives the semver_ord column from the num column value:

Bildschirmfoto 2025-03-06 um 08 46 02

Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed:

with versions_to_update as (
    select id, num
    from versions
    where semver_ord = 'null'::jsonb
    limit 1000
)
update versions
    set semver_ord = semver_ord(num)
    where id in (select id from versions_to_update);

This PR does not yet implement any code to actually use this new column, since it will need to be backfilled first. Once that has happened and verified to produce correct results, we can start to migrate our codebase to move the semver sorting into the database and potentially even calculate the "default version" directly in the database too.

Turbo87 avatar Mar 06 '25 07:03 Turbo87

@Gankra you know a lot about semver edge cases. do you want to try and break this before we actually start using the new column? :D

Turbo87 avatar Mar 07 '25 09:03 Turbo87

ahahahaha

I think as long as it handles https://crates.io/crates/cursed-trying-to-break-cargo/ ok there's not much else for me to throw at it?

Gankra avatar Mar 07 '25 14:03 Gankra

Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.

Gankra avatar Mar 07 '25 14:03 Gankra

I think as long as it handles https://crates.io/crates/cursed-trying-to-break-cargo/ ok there's not much else for me to throw at it?

it's looking good so far :)

Turbo87 avatar Mar 07 '25 14:03 Turbo87

Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.

are you sure you were in semver sorting mode? https://crates.io/crates/cursed-trying-to-break-cargo/versions?sort=semver looks okayish to me 😅

Turbo87 avatar Mar 07 '25 15:03 Turbo87

Oh you're totally right, it's so rare to see date and semver not be 1:1 for crates I forgot date is the default :)

Gankra avatar Mar 07 '25 19:03 Gankra