cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Cargo time machine (generate lock files based on old registry state)

Open est31 opened this issue 7 years ago • 13 comments

Some time ago I wanted to check how much faster my library has got in various Rust versions. So I cloned the repo and checked out an older git commit and used rustup to get an older rustc and tested it both with the older rustc and the newer one... it downloaded various dependencies and tried to build it (with the older rustc) but then it failed because apparently the crates on crates.io required newer Rust versions than the one I was benchmarking my library with. So I figured out a trick: I've told cargo to not use crates.io as a registry source but my own private clone, and I made that clone point to a commit from back when the compiler got released. This worked really well!

Now to my feature request. I'd like to have this automated, via a flag in cargo: if you invoke cargo generate-lockfile --registry-time 2017-01-01, cargo would check out a commit from that day from the registry and use that commit for lockfile generation.

I think it is justified to call this feature "time machine" because it emulates the time from back then.

Everyone who has missed the presence of a Cargo.lock can feel this I think :).

est31 avatar Mar 21 '18 19:03 est31

This sounds like it'd be really awesome as some kind of sub-command/cargo wrapper!

So you might run cargo time-machine bench --registry-time 2017-01-01 -p my-crate and it'll run benchmarks using the version of rustc and your crate closest to that date. You could use cargo time-machine install --registry-time 2017-01-01 -p my-crate for installation, and so on.

Michael-F-Bryan avatar Mar 23 '18 11:03 Michael-F-Bryan

@Michael-F-Bryan as we've got other commands to influence cargo resolution like #4100 , I think the best place for integration is Cargo itself.

est31 avatar Mar 29 '18 01:03 est31

Time aside, I'd like to have a way to do this via git hash of the crates.io index.That'd be great for reproducing bug reports.

Also see https://github.com/rust-lang/cargo/issues/6161

joshtriplett avatar Oct 10 '18 20:10 joshtriplett

Is the publication date of a package stored somewhere? With that, it would be possible to filter package versions from ret just before the sort_unstable_by: https://github.com/rust-lang/cargo/blob/7ba6e497be1b818a6ea4628f46a533a9c7c9f871/src/cargo/core/resolver/dep_cache.rs#L166-L187

fpoli avatar May 28 '20 09:05 fpoli

Alternatively, to use a specific git hash one could modify this git fetch: https://github.com/rust-lang/cargo/blob/058baec9d9bba1d3417881ae1d5efc27c84d956b/src/cargo/sources/registry/remote.rs#L220-L223

fpoli avatar May 28 '20 09:05 fpoli

@fpoli the publication date is stored in git history, as in you have to find the commit that introduces the crate. Which is quite involved algorithmically and in time overhead as well. I recommend going via git commit hashes which is what I originally envisioned as well.

est31 avatar May 28 '20 12:05 est31

Could crate publication dates be detected once and added to the index data directly (with future additions adding the data automatically)? It should just be a blame for each entry, but that's mostly a guess not knowing the way the index is stored off-hand.

mathstuf avatar May 28 '20 12:05 mathstuf

@mathstuf that wouldn't detect yanked/unyanked crates. It seems that you can yank and un-yank crates arbitrarily often.

est31 avatar May 28 '20 18:05 est31

Hmm. It seems that the index could be fetched from an arbitrary refspec.

See that https://github.com/mathstuf/rust-keyutils/blob/master%40%7b2020-01-01%7d/.cirrus.yml is returning valid contents. So, at least for github-hosted index files, this kind of URL abuse is possible. Not so sure about other index hosting locations though.

This is basically using the master@{when} syntax for refs, so arbitrary Git-supported "when" clauses are likely allowed (last week, yesterday, specific times, etc.).

mathstuf avatar May 28 '20 19:05 mathstuf

Hmmm nice it works from Github's API as well:

curl -i 'https://api.github.com/repos/est31/cargo-udeps/commits/master@\{2020-01-01\}'

It could be special cased for github with a fall back to a full clone of the index repo when it doesn't detect github or hits an API limit or other HTTP error.

est31 avatar May 28 '20 22:05 est31

I've implemented this:

https://crates.io/crates/lts/0.2.0

kornelski avatar Jan 26 '23 17:01 kornelski

We talked about this recently somewhere. I wonder if it was in person at RustConf which means no notes.

I would expect this to be a part of an interface for cargo generate-lockfile. It doesn't need to exist everywhere.

It can't fully reproduce a lockfile from a past state because the lockfile only resolves maximally for the subsection of the dependency tree that changed. However, still being able to generate it for a given time can be useful.

When we had only the git registry, it would be easy to think we could use the git history until we take squashing into account.

Now we also have the sparse registry, so we'd need a design that can take that into account, including

  • What time resolution can work (or if we use some kind of counter, how to turn that into real-world measurement)
  • At what stage in the publish process do we capture that timestamp

Timestamps, instead of counters, seem really useful for dealing with the human side to this. I don't think we need fine resolution on this, if two packages are published in close succession, oh well.

I bet the server could even backfill the timestamps.

I don't think the specific stage will matter all that much either.

The big question though is yanks. We don't have a history of when things have been yanked and unyanked.

epage avatar Oct 18 '23 01:10 epage

I've posted #16265 for an initial implementation. The tracking issue for it will be #16271

epage avatar Nov 17 '25 20:11 epage