Add `semver_ord(num)` SQL function and `version.semver_ord` column
We currently rely on the Rust semver crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need.
Ideally, we would be using https://pgxn.org/dist/semver/ or https://github.com/pgcentralfoundation/pgrx to solve this, but unfortunately most PostgreSQL hosters don't allow/support custom extensions, and it would also make local development a bit more challenging.
As a workaround, this PR implements a semver_ord(num) pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty.
Update: After discussion in this PR we changed the maximum number of prerelease specifiers to 10, but the remaining specifiers are now appended to the array as a string, to at least ensure uniqueness and a stable sort order.
The database migration in this commit also adds a new semver_ord column to the versions table, and an on-insert trigger function that automatically derives the semver_ord column from the num column value:
Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed:
with versions_to_update as (
select id, num
from versions
where semver_ord = 'null'::jsonb
limit 1000
)
update versions
set semver_ord = semver_ord(num)
where id in (select id from versions_to_update);
This PR does not yet implement any code to actually use this new column, since it will need to be backfilled first. Once that has happened and verified to produce correct results, we can start to migrate our codebase to move the semver sorting into the database and potentially even calculate the "default version" directly in the database too.
@Gankra you know a lot about semver edge cases. do you want to try and break this before we actually start using the new column? :D
ahahahaha
I think as long as it handles https://crates.io/crates/cursed-trying-to-break-cargo/ ok there's not much else for me to throw at it?
Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.
I think as long as it handles https://crates.io/crates/cursed-trying-to-break-cargo/ ok there's not much else for me to throw at it?
it's looking good so far :)
Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.
are you sure you were in semver sorting mode? https://crates.io/crates/cursed-trying-to-break-cargo/versions?sort=semver looks okayish to me 😅
Oh you're totally right, it's so rare to see date and semver not be 1:1 for crates I forgot date is the default :)