cargo feat: index cache in SQLite3

What does this PR try to resolve?

https://github.com/rust-lang/cargo/issues/6908

Add an unstable feature to store index cache in SQLite3 database.

The flag is undocumented since the schema is just a naive dump. The implementation can be used as benchmark baseline while we explore other cache/SQL schema design.

How should we test and review this PR?

Run this to force using SQLite for index cache.

__CARGO_TEST_FORCE_SQLITE_INDEX_CACHE=1 cargo test -p cargo --test testsuite

You'll find SQLite3 db file at registry/index/<index-url>/.cache/index-cache.db.

Additional information

Mar 14 '24 21:03 weihanglo

r? @epage

rustbot has assigned @epage. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Mar 14 '24 21:03 rustbot

❗ Help wanted: benchmark on Windows and HDD ❗

Benchmark result

commit: rust-lang/cargo@58655e5e70d350af0eb2b469086e6a57005c0cdc
project: awslabs/aws-sdk-rust@f8493097187e6b7c35b29f182ea897dd61b0a4f6

Cache write

Result: naive SQLite cache is significant slower than fs cache on macOS and Linux. This might be the index cache blobs are too big. There is a post on SQLite website about storing blobs inside DB file versus separate files https://www.sqlite.org/intern-v-extern-blob.html. The rule of thumb from that experiment is file size larger than 100k would be better to store separately. I've picked some popular crates, and most of them are over 50k.

See the list of randomly-picked popular crates

size-in-kb,crate
15,sha1
19,memchr
23,base64
33,walkdir
35,tar
36,log
42,sha2
42,thiserror
44,http
54,anyhow
55,libc
59,proc-macro2
63,tracing
67,url
80,rand
92,futures
131,serde
160,tonic
174,openssl
175,regex
398,windows-sys
460,syn
490,reqwest
568,tokio
572,cargo
603,hyper
845,rustix

benchmark script

hyperfine --warmup 2 \
  --min-runs 100 \
  -L ZFLAG -Zindex-cache-sqlite, \
  --setup "CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' fetch" \
  --prepare 'rm -rf "$MY_CARGO_HOME/registry/index/<git-index-url-and-hash>/.cache"' \
  "CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}"

Result on Linux with SSD

AWS EC2 c6a.8xlarge
vCPU: 32
host: x86_64-unknown-linux-gnu
os: Amazon Linux AMI 2.0.0 [64-bit]
disk: AWS EBS SSD 500GB (NVMe)

Command	Mean [s]	Min [s]	Max [s]	Relative
SQLite cache	2.829 ± 0.853	2.446	5.689	4.22 ± 1.27
old fs cache	0.670 ± 0.004	0.664	0.677	1.00 👑

Result on macOS with SSD

Apple Macbook M1 Pro
CPU: 10 (8 + 2)
mem: 32 GB
host: aarch64-apple-darwin
os: Mac OS 14.3.1 [64-bit]

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
SQLite cache	777.4 ± 8.8	759.4	802.4	1.34 ± 0.02
old fs cache	578.8 ± 4.7	572.3	604.6	1.00 👑

Cache read

Result: no significant different between Linux and macOS on cache read

benchmark script

hyperfine --warmup 2 \
  --min-runs 100 \
  -L ZFLAG -Zindex-cache-sqlite, \
  --setup "CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile {ZFLAG}" \
  "CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}"

Result on Linux with SSD

AWS EC2 c6a.8xlarge
vCPU: 32
host: x86_64-unknown-linux-gnu
os: Amazon Linux AMI 2.0.0 [64-bit]
disk: AWS EBS SSD 500GB (NVMe)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
SQLite cache	446.9 ± 4.1	439.8	458.4	1.01 ± 0.01
old fs cache	441.3 ± 4.6	430.6	460.7	1.00 👑

Result on Linux with SSD

Apple Macbook M1 Pro
CPU: 10 (8 + 2)
mem: 32 GB
host: aarch64-apple-darwin
os: Mac OS 14.3.1 [64-bit]

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
SQLite cache	329.0 ± 5.5	316.9	346.2	1.00
old fs cache	330.3 ± 6.0	315.6	352.5	1.00 ± 0.02 👑

Disk usage

Linux
- SQLite cache: 17,555,456
- old fs cache: 17,303,091
macOS
- SQLite cache: 17,555,456
- old fs cache: 17,303,091
Windows

Mar 15 '24 05:03 weihanglo

One of the things that was important for the last-use performance was to batch the inserts in a single transaction. From what I can tell, this is using a separate transaction for each insert which can be expensive. That might be something to experiment with.

Another thought I had was that the schema could be (name, version, blob), so that the blob is only stored per index line which should be a lot smaller than the entire blob of all versions. That would require some deeper changes of course. It might also be tricky to avoid too much JSON parsing overhead to reliably extract the version.

Have you also tried adjusting the page size?

Mar 20 '24 02:03 ehuss

Yes, both are good ideas, though requires more than trivial changes and is hard to share the same interface with old cache mechanism 😞.

Have you also tried adjusting the page size?

Yes. And also a series of combination of WAL journal, cache size/limit, synchronouns=normal, and other pragmas. The difference was insignificant.

Mar 20 '24 03:03 weihanglo

:umbrella: The latest upstream changes (presumably #13632) made this pull request unmergeable. Please resolve the merge conflicts.

Mar 23 '24 21:03 bors

batch the inserts in a single transaction. From what I can tell, this is using a separate transaction for each insert which can be expensive. That might be something to experiment with.

I've pushed some variants of this:

deferred: 4b4a9934f86 — defer all insertions until dropping CacheManager.
One-JSON-per-row: aadab5fb97a — deferred + change the schema to (name, version, one line of JSON blob)
blob dump: 4ab7aab0ffb — this is the initial "dump" everything to db" integration
old fs: 3b43aa875fa2

On macOS, both of the read and write performance of these two variants are on par with the original filesystem cache.

Read	Mean [ms]	Min [ms]	Max [ms]	Relative
one-JSON-per-row	473.5 ± 6.5	460.1	503.1	1.00 ± 0.02
deferred	476.2 ± 13.1	459.9	528.0	1.01 ± 0.03
blob dump	480.1 ± 13.4	459.8	519.8	1.02 ± 0.03
old fs	472.1 ± 7.6	461.2	503.9	1.00

Write	Mean [ms]	Min [ms]	Max [ms]	Relative
one-JSON-per-row	747.4 ± 5.7	740.1	760.3	1.03 ± 0.02
deferred	724.0 ± 11.8	706.4	759.0	1.00
blob dump	1001.5 ± 37.6	951.5	1085.7	1.38 ± 0.06
old fs	756.0 ± 9.7	747.0	786.4	1.04 ± 0.02

On Linux, however, we've got a 5% performance hit on write.

Read	Mean [ms]	Min [ms]	Max [ms]	Relative
one-JSON-per-row	569.5 ± 3.6	560.4	582.7	1.02 ± 0.01
deferred	565.8 ± 4.7	558.1	580.9	1.01 ± 0.01
blob dump	566.6 ± 5.0	557.3	584.4	1.02 ± 0.01
old fs	557.7 ± 5.6	549.6	580.7	1.00

Write	Mean [ms]	Min [ms]	Max [ms]	Relative
one-JSON-per-row	901.6 ± 6.1	891.8	913.2	1.09 ± 0.01
deferred	867.4 ± 7.8	855.5	889.6	1.05 ± 0.01
blob dump	2908.5 ± 101.4	2792.4	3209.3	3.53 ± 0.13
old fs	823.9 ± 6.4	809.0	841.8	1.00

@ehuss do you have time doing a simple benchmark on Windows?

I also wonder if it is not worthy due to

the complexity we'll introduce
deferring insertion until drop seems too hacky
the database solution might not scale well if blobs become too large.

Apr 02 '24 20:04 weihanglo

Going to close this as it is not going to anywhere in a near future. We've collected some interesting data points and people can look into them when needed :)

Jul 18 '24 21:07 weihanglo

cargo cargo copied to clipboard

feat: index cache in SQLite3

What does this PR try to resolve?

How should we test and review this PR?

Additional information

❗ Help wanted: benchmark on Windows and HDD ❗

Benchmark result

Cache write

Cache read

Disk usage

cargo
cargo copied to clipboard