docs.rs icon indicating copy to clipboard operation
docs.rs copied to clipboard

Consistency check with crates.io database

Open jyn514 opened this issue 5 years ago • 3 comments

docs.rs looks at the crates.io index the first time a crate is released, but never again after that. This means that if a crate is deleted from an index, the documentation stays up (e.g. https://github.com/rust-lang/docs.rs/issues/765). It would be great to have a way to compare the docs.rs database with the crates.io index to make sure they match up. It should start by verifying the name version pairs match up, but could be expanded to also ensure the authors are consistent as well.

Note that the author thing is a little tricky since we currently store authors in two different places: author_rels as a database relation and releases.authors as JSON. Before implementing the consistency check, we should refactor the database to only use author_rels.

jyn514 avatar May 21 '20 13:05 jyn514

Doing much more than #898 does is going to necessitate reading the Cargo.toml of each crate and/or asking crates.io about them (the only other thing I think we can verify from the index itself is the dependencies of each version).

Doing this via the API would be spammy, so probably best to work from something like the database dumps.

Nemo157 avatar Jul 18 '20 14:07 Nemo157

Database dumps sound fine, 24 hours is more than recent enough.

jyn514 avatar Jul 18 '20 15:07 jyn514

note that a big chunk of the consistency check is solved in https://github.com/rust-lang/docs.rs/pull/1990.

The part missing in the logic is the information from the crates.io API.

Also we didn't execute the check yet, which is blocked on https://github.com/rust-lang/docs.rs/issues/1011. My first run of the check would have requeued around 18k releases that previously failed because of (for example) wrong metadata. I would prefer requeueing them only when we would have a valid build-attempt entry in the database afterwards, which means we can re-run the consistency check regularly without re-queuing these 18k releases all the time.

syphar avatar Oct 24 '23 08:10 syphar

after #1011 was mostly done I ran the consistency check:

============
SUMMARY
============
difference found:
ReleaseNotInDb    => 12605
ReleaseYank       =>  441
CrateNotInIndex   =>   71
ReleaseNotInIndex =>    5
CrateNotInDb      =>  480
============
activities triggered:
builds queued:    13472
crates deleted:     71
releases deleted:    5
yanks corrected:   441

I'll close this issue now.

( we might run the consistency check via scheduler at some point)

syphar avatar Jun 24 '24 12:06 syphar