cargo-supply-chain icon indicating copy to clipboard operation
cargo-supply-chain copied to clipboard

Consider transparently downloading the DB dump instead of fetching live results by default

Open Shnatsel opened this issue 2 years ago • 3 comments

TL;DR: run cargo supply-chain update implicitly from other commands, instead of defaulting to querying the API.

If the cache is expired or nonexistent, and --cache-max-age allows it, we could download the latest DB dump by default instead of fetching live results. This would be a lot faster in the typical case.

We would still need to fall back to querying live data from the API if the latest DB dump published by crates.io is older than --cache-max-age.

Shnatsel avatar Mar 18 '22 17:03 Shnatsel

The part of the dump we need seems to only require a 50Mb download, which is not too bad.

Shnatsel avatar Mar 18 '22 17:03 Shnatsel

@Shnatsel Are there any "gotchas" you would anticipate, were someone to try to implement this?

smoelius avatar Apr 02 '24 11:04 smoelius

The database dumps are not officially in a stable format, so I could see the format changing in the future and the tool breaking. However, in practice the parts we care about have not changed in years.

The database download also relies on a somewhat fragile order of the files in the archive to reduce the download size. This could easily change without warning and increase the download size considerably.

Finally, these aren't really live results (up to 48 hours out of date by default), but that is probably fine as long as we display a warning about it.

I don't foresee any issues within the code of cargo supply-chain itself.

Shnatsel avatar Apr 02 '24 12:04 Shnatsel