cargo Use another hashing algorithm? 250K calls to Siphash

While profiling Clippy I came accross 250k calls to <core::hash::sip::Hasher<core::hash::sip::Sip13Rounds> as core::hash::Hasher>::write, and they turn out to be the most time-consuming by themselves (at least that's what callgrind says), all of them coming from Cargo.

These numbers in particular is from bumpalo-3.16.0

The 4 callers of this function are from hashing &url::Url (hash_one -> 26 296 times), hashing &semver::Version (hash_one -> 83 342x), hashing package_id::PackageId (hash -> 48 554x) and dependency::Dependency (hash -> 31 099x)

To reproduce

cd bumpalo-3.16.0
CARGO_TARGET_DIR=/tmp/m1 valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --trace-children=yes /home/meow/.rustup/toolchains/nightly-2025-05-31-x86_64-unknown-linux-gnu/bin/cargo clippy

Note that this is without incremental compilation, thus the CARGO_TARGET_DIR env var.

I'm not sure if there's a reason why Cargo uses the default hasher.

Jun 09 '25 22:06 blyxyas

Cargo leverages https://github.com/rust-lang/rustc-stable-hash, as we want path on file system as stable as possible, regardless endianness or platform. Tha said, not everything in this codebase has this requirement There are some previous attempts but got stale https://github.com/rust-lang/cargo/pull/14665

Edit: Oh I see. You're talking about siphash from std not from rustc-stable-hash. The thing about stable hash may be irrelevant here.

Jun 09 '25 22:06 weihanglo

I wonder how bad it was in terms of total run time of the clippy command being benchmarked.

Jun 09 '25 22:06 weihanglo

On this crate, it seems to represent a 9.89% of the runtime, with 14.155.123 instructions.

Jun 10 '25 10:06 blyxyas

If we do this, a question is which hash. Currently, Cargo depends on ahash through [email protected]. However, [email protected] has switched to foldhash ( rust-lang/hashbrown#563). For toml, I'm following suit in toml-rs/toml#956

Jun 16 '25 20:06 epage

On this crate, it seems to represent a 9.89% of the runtime, with 14.155.123 instructions.

I want to add a note here, part of the Cargo runtime, Clippy having so many lints, these 14 million instructions is not that relevant. I'm talking from the Cargo process point of view.

Jun 17 '25 00:06 blyxyas

If we move forward with this, we should probably enable tomls fast_hash feature.

Also, in seeing how slow BTreeMaps can be compared to hash maps, I wonder if there are places where behavior wouldn't be negatively impacted in switching to IndexMaps. See https://epage.github.io/blog/2025/07/toml-09/ for some comparisons.

Jul 08 '25 19:07 epage

With the toml changes in, I was looking at where we spend our time

callgrind:

Interesting to see how hashing compares to other operations

CARGO_LOG_PROFILE

When zooming in, there are a lot of gaps that I'd like to better understand.

Jul 09 '25 18:07 epage