Use another hashing algorithm? 250K calls to Siphash
While profiling Clippy I came accross 250k calls to <core::hash::sip::Hasher<core::hash::sip::Sip13Rounds> as core::hash::Hasher>::write, and they turn out to be the most time-consuming by themselves (at least that's what callgrind says), all of them coming from Cargo.
These numbers in particular is from bumpalo-3.16.0
The 4 callers of this function are from hashing &url::Url (hash_one -> 26 296 times), hashing &semver::Version (hash_one -> 83 342x), hashing package_id::PackageId (hash -> 48 554x) and dependency::Dependency (hash -> 31 099x)
To reproduce
cd bumpalo-3.16.0
CARGO_TARGET_DIR=/tmp/m1 valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --trace-children=yes /home/meow/.rustup/toolchains/nightly-2025-05-31-x86_64-unknown-linux-gnu/bin/cargo clippy
Note that this is without incremental compilation, thus the CARGO_TARGET_DIR env var.
I'm not sure if there's a reason why Cargo uses the default hasher.
Cargo leverages https://github.com/rust-lang/rustc-stable-hash, as we want path on file system as stable as possible, regardless endianness or platform. Tha said, not everything in this codebase has this requirement There are some previous attempts but got stale https://github.com/rust-lang/cargo/pull/14665
Edit: Oh I see. You're talking about siphash from std not from rustc-stable-hash. The thing about stable hash may be irrelevant here.
I wonder how bad it was in terms of total run time of the clippy command being benchmarked.
On this crate, it seems to represent a 9.89% of the runtime, with 14.155.123 instructions.
If we do this, a question is which hash. Currently, Cargo depends on ahash through [email protected]. However, [email protected] has switched to foldhash ( rust-lang/hashbrown#563). For toml, I'm following suit in toml-rs/toml#956
On this crate, it seems to represent a 9.89% of the runtime, with 14.155.123 instructions.
I want to add a note here, part of the Cargo runtime, Clippy having so many lints, these 14 million instructions is not that relevant. I'm talking from the Cargo process point of view.
If we move forward with this, we should probably enable tomls fast_hash feature.
Also, in seeing how slow BTreeMaps can be compared to hash maps, I wonder if there are places where behavior wouldn't be negatively impacted in switching to IndexMaps. See https://epage.github.io/blog/2025/07/toml-09/ for some comparisons.
With the toml changes in, I was looking at where we spend our time
callgrind:
- Interesting to see how hashing compares to other operations
CARGO_LOG_PROFILE
- When zooming in, there are a lot of gaps that I'd like to better understand.