dashmap
dashmap copied to clipboard
`capacity` behavior does not align with its documentation
Either there is a bug in code, or the documentation needs to be changed for capacity. The documentation states (emphasis added):
Returns how many key-value pairs the map can store without reallocating.
The following code illustrates a reallocation occurs before inserting more than capacity key-value pairs.
use dashmap::DashMap;
fn main() {
let map = DashMap::with_capacity(192);
let cap = map.capacity();
for i in 0..cap {
map.insert(i, ());
}
assert_eq!(cap, map.len());
assert_ne!(cap, map.capacity());
}
[zack@laptop src]$ uname -a
Linux laptop 6.10.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 12 Sep 2024 17:21:02 +0000 x86_64 GNU/Linux
[zack@laptop src]$ cargo -V
cargo 1.81.0 (2dbb1af80 2024-08-20)
Unsurprisingly, the same problem exists for DashSet.
I can sporadically trigger a reallocation in as few as 25 inserts (i.e., after ≈ 13% of the originally reported capacity—which is 192 on my machine—is added).
Is it possible to calculate a lower bound like HashMap::capacity?
Returns the number of elements the map can hold without reallocating.
This number is a lower bound; the
HashMap<K, V>might be able to hold more, but is guaranteed to be able to hold at least this many.
iirc many hashmaps have a fill-factor for reallocating, which is basically the fraction of slots to be occupied before reallocating (unlike Vec, which only reallocates once its underlying allocation is full). I would image capacity here means the theoretically maximum number of non-colliding members whereas you would expect it to be fill_factor * num_entry_slots?
I am not well versed with dashmap, so this is really just a thought on why this might be the case.
I'm aware of what fill-factor is. The point of this issue is that the documentation is not correct, and the ideal fix would be to change the implementation such that it returns a lower bound on the number of items that can be inserted without causing a reallocation à la HashMap. Without this, there is no way one can ensure a DashMap does not grow past a certain threshold that would cause a re-allocation forcing one to use something like a HashMap instead.