crates.io
crates.io copied to clipboard
Experimental database dumps changelog
This is a low-traffic issue tracking all the changes happening to the experimental database dumps. We recommend subscribing to this issue to get notified whenever we make some changes to the contents of the dumps.
The next crates.io deploy (happening in the next few days) will include the following changes to the database dumps:
- PR #3612: The
textsearchable_index_colcolumn will be removed fromcrates.csv, as that column is an implementation detail of crates.io's search. Users importing the database dumps into a PostgreSQL database will not be affected by this change, as a trigger will populate that column at import time. - PR #3611: The
version_downloads.csvfile will only include the last 90 days of data instead of full day-to-day historical data. Cumulative download counts are still available incrates.csvandversions.csv. - PR #3549: The
version_authors.csvfile will be removed, as that data was deleted from the crates.io database too.
We also plan to make the following changes in the future:
- Issue #3479: all the data from
version_downloads.csvwill be moved out of the database dump into separate files, one for each day. This will allow clients interested in this data to download it separately.
Two relevant changes were just deployed:
- https://github.com/rust-lang/crates.io/pull/5077 and
- https://github.com/rust-lang/crates.io/pull/5074
- https://github.com/rust-lang/crates.io/pull/8155 will delete the
badgestable
- https://github.com/rust-lang/crates.io/pull/8232 added a new
crate_downloadstable, which is supposed to replace thecrates.downloadscolumn soon. this was done for performance reasons to reduce the amount of bloat in thecratestable from the regulardownloadscolumn updates. at the moment the data should be in sync, but if everything works out we will stop writing to thecrates.downloadscolumn in the near future and eventually remove it.
- as mentioned in the last update, https://github.com/rust-lang/crates.io/pull/8295 is going to disable writes to the
crates.downloadscolumn. we will keep the column around for now to avoid unnecessary schema churn, but once the system has shown the expected performance benefits we will most likely remove the column completely.
- once https://github.com/rust-lang/crates.io/pull/8233 is merged and deployed it will remove the
crates.downloadscolumn. please us thecrate_downloadstable instead.
- https://github.com/rust-lang/crates.io/pull/8484 will introduce a new experimental
default_versionstable with a mapping from crates to their "default" version, that will be shown by the frontend and used in e.g. reverse dependency queries.
- https://github.com/rust-lang/crates.io/pull/8748 added an experimental ZIP file artifact at https://static.crates.io/db-dump.zip. this file has the advantage of not having to decompress the entire file if you only need access to a certain database table CSV file. compared to the tarball the ZIP file does not have a top-level datetime path prefix, otherwise the files should contain the exact same data.