cargo
cargo copied to clipboard
feat: index cache in SQLite3
What does this PR try to resolve?
https://github.com/rust-lang/cargo/issues/6908
Add an unstable feature to store index cache in SQLite3 database.
The flag is undocumented since the schema is just a naive dump. The implementation can be used as benchmark baseline while we explore other cache/SQL schema design.
How should we test and review this PR?
Run this to force using SQLite for index cache.
__CARGO_TEST_FORCE_SQLITE_INDEX_CACHE=1 cargo test -p cargo --test testsuite
You'll find SQLite3 db file at registry/index/<index-url>/.cache/index-cache.db.
Additional information
r? @epage
rustbot has assigned @epage. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
❗ Help wanted: benchmark on Windows and HDD ❗
Benchmark result
- commit: rust-lang/cargo@58655e5e70d350af0eb2b469086e6a57005c0cdc
- project: awslabs/aws-sdk-rust@f8493097187e6b7c35b29f182ea897dd61b0a4f6
Cache write
Result: naive SQLite cache is significant slower than fs cache on macOS and Linux. This might be the index cache blobs are too big. There is a post on SQLite website about storing blobs inside DB file versus separate files https://www.sqlite.org/intern-v-extern-blob.html. The rule of thumb from that experiment is file size larger than 100k would be better to store separately. I've picked some popular crates, and most of them are over 50k.
See the list of randomly-picked popular crates
size-in-kb,crate
15,sha1
19,memchr
23,base64
33,walkdir
35,tar
36,log
42,sha2
42,thiserror
44,http
54,anyhow
55,libc
59,proc-macro2
63,tracing
67,url
80,rand
92,futures
131,serde
160,tonic
174,openssl
175,regex
398,windows-sys
460,syn
490,reqwest
568,tokio
572,cargo
603,hyper
845,rustix
benchmark script
hyperfine --warmup 2 \
--min-runs 100 \
-L ZFLAG -Zindex-cache-sqlite, \
--setup "CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' fetch" \
--prepare 'rm -rf "$MY_CARGO_HOME/registry/index/<git-index-url-and-hash>/.cache"' \
"CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}"
Result on Linux with SSD
AWS EC2 c6a.8xlarge
vCPU: 32
host: x86_64-unknown-linux-gnu
os: Amazon Linux AMI 2.0.0 [64-bit]
disk: AWS EBS SSD 500GB (NVMe)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
| SQLite cache | 2.829 ± 0.853 | 2.446 | 5.689 | 4.22 ± 1.27 |
| old fs cache | 0.670 ± 0.004 | 0.664 | 0.677 | 1.00 👑 |
Result on macOS with SSD
Apple Macbook M1 Pro
CPU: 10 (8 + 2)
mem: 32 GB
host: aarch64-apple-darwin
os: Mac OS 14.3.1 [64-bit]
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| SQLite cache | 777.4 ± 8.8 | 759.4 | 802.4 | 1.34 ± 0.02 |
| old fs cache | 578.8 ± 4.7 | 572.3 | 604.6 | 1.00 👑 |
Cache read
Result: no significant different between Linux and macOS on cache read
benchmark script
hyperfine --warmup 2 \
--min-runs 100 \
-L ZFLAG -Zindex-cache-sqlite, \
--setup "CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile {ZFLAG}" \
"CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}"
Result on Linux with SSD
AWS EC2 c6a.8xlarge
vCPU: 32
host: x86_64-unknown-linux-gnu
os: Amazon Linux AMI 2.0.0 [64-bit]
disk: AWS EBS SSD 500GB (NVMe)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| SQLite cache | 446.9 ± 4.1 | 439.8 | 458.4 | 1.01 ± 0.01 |
| old fs cache | 441.3 ± 4.6 | 430.6 | 460.7 | 1.00 👑 |
Result on Linux with SSD
Apple Macbook M1 Pro
CPU: 10 (8 + 2)
mem: 32 GB
host: aarch64-apple-darwin
os: Mac OS 14.3.1 [64-bit]
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| SQLite cache | 329.0 ± 5.5 | 316.9 | 346.2 | 1.00 |
| old fs cache | 330.3 ± 6.0 | 315.6 | 352.5 | 1.00 ± 0.02 👑 |
Disk usage
- Linux
- SQLite cache: 17,555,456
- old fs cache: 17,303,091
- macOS
- SQLite cache: 17,555,456
- old fs cache: 17,303,091
- Windows
One of the things that was important for the last-use performance was to batch the inserts in a single transaction. From what I can tell, this is using a separate transaction for each insert which can be expensive. That might be something to experiment with.
Another thought I had was that the schema could be (name, version, blob), so that the blob is only stored per index line which should be a lot smaller than the entire blob of all versions. That would require some deeper changes of course. It might also be tricky to avoid too much JSON parsing overhead to reliably extract the version.
Have you also tried adjusting the page size?
Yes, both are good ideas, though requires more than trivial changes and is hard to share the same interface with old cache mechanism 😞.
Have you also tried adjusting the page size?
Yes. And also a series of combination of WAL journal, cache size/limit, synchronouns=normal, and other pragmas. The difference was insignificant.
:umbrella: The latest upstream changes (presumably #13632) made this pull request unmergeable. Please resolve the merge conflicts.
batch the inserts in a single transaction. From what I can tell, this is using a separate transaction for each insert which can be expensive. That might be something to experiment with.
I've pushed some variants of this:
- deferred: 4b4a9934f86 — defer all insertions until dropping
CacheManager. - One-JSON-per-row: aadab5fb97a — deferred + change the schema to
(name, version, one line of JSON blob) - blob dump: 4ab7aab0ffb — this is the initial "dump" everything to db" integration
- old fs: 3b43aa875fa2
On macOS, both of the read and write performance of these two variants are on par with the original filesystem cache.
| Read | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| one-JSON-per-row | 473.5 ± 6.5 | 460.1 | 503.1 | 1.00 ± 0.02 |
| deferred | 476.2 ± 13.1 | 459.9 | 528.0 | 1.01 ± 0.03 |
| blob dump | 480.1 ± 13.4 | 459.8 | 519.8 | 1.02 ± 0.03 |
| old fs | 472.1 ± 7.6 | 461.2 | 503.9 | 1.00 |
| Write | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| one-JSON-per-row | 747.4 ± 5.7 | 740.1 | 760.3 | 1.03 ± 0.02 |
| deferred | 724.0 ± 11.8 | 706.4 | 759.0 | 1.00 |
| blob dump | 1001.5 ± 37.6 | 951.5 | 1085.7 | 1.38 ± 0.06 |
| old fs | 756.0 ± 9.7 | 747.0 | 786.4 | 1.04 ± 0.02 |
On Linux, however, we've got a 5% performance hit on write.
| Read | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| one-JSON-per-row | 569.5 ± 3.6 | 560.4 | 582.7 | 1.02 ± 0.01 |
| deferred | 565.8 ± 4.7 | 558.1 | 580.9 | 1.01 ± 0.01 |
| blob dump | 566.6 ± 5.0 | 557.3 | 584.4 | 1.02 ± 0.01 |
| old fs | 557.7 ± 5.6 | 549.6 | 580.7 | 1.00 |
| Write | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
| one-JSON-per-row | 901.6 ± 6.1 | 891.8 | 913.2 | 1.09 ± 0.01 |
| deferred | 867.4 ± 7.8 | 855.5 | 889.6 | 1.05 ± 0.01 |
| blob dump | 2908.5 ± 101.4 | 2792.4 | 3209.3 | 3.53 ± 0.13 |
| old fs | 823.9 ± 6.4 | 809.0 | 841.8 | 1.00 |
@ehuss do you have time doing a simple benchmark on Windows?
I also wonder if it is not worthy due to
- the complexity we'll introduce
- deferring insertion until drop seems too hacky
- the database solution might not scale well if blobs become too large.
Going to close this as it is not going to anywhere in a near future. We've collected some interesting data points and people can look into them when needed :)