hashbrown icon indicating copy to clipboard operation
hashbrown copied to clipboard

Reduce size_of<HashMap> ?

Open SimonSapin opened this issue 6 years ago • 3 comments

With the switch to Hashbrow, std::mem::size_of::<std::collections::HashMap<(), ()>>() on 64-bit platforms grew from 40 bytes to 56. (24 to 40 bytes with BuildHasherDefault.)

In Servo’s DOM implementation we have types whose size_of is several hundreds of bytes. Because some not-so-unusual pages can have very many DOM nodes this size can add up to significant memory usage. We have unit tests for size_of to ensure it does not accidentally grow, which fail in today’s Rust Nightly because several types grew by 16 bytes because they contain a HashMap.

Hashbrown’s HashMap contains a RawTable which has five pointer-sized fields:

https://github.com/rust-lang/hashbrown/blob/7e79b0c33ec434857e46a8ff3d4ece413c21aaba/src/raw/mod.rs#L328-L348

Some of them seem potentially redundant, but I’m not sure. For example there are two pointers that seem to be in the same allocation. How expensive would it be to re-compute the second pointer every time it is needed?

Are bucket_mask, growth_left, and len related such that one of them could be computed from the others?

SimonSapin avatar Apr 25 '19 11:04 SimonSapin

The data pointer could be derived from the ctrl pointer, which would save 8 bytes. However the other fields can't be derived from each other and must be kept.

Amanieu avatar Apr 25 '19 14:04 Amanieu

I think it would be much more interesting to investigate moving to a layout where all of the fields are in the same allocation as the keys and hashes, with an empty singleton to avoid allocating empty maps. As is done with thin-vec and BTreeMap.

This would make it much cheaper to move HashMaps around, and would dramatically reduce the memory footprint of things which only sometimes actually create a real map. (Option<HashMap>, Vec<HashMap>, or just a rarely used HashMap::new()).

It's non-trivial work though, because it requires carefully ensuring you never mess up the CoW nature of the singleton, as well as carefully ensuring that you never make the size/align mismatch the allocation. That said I think it might be possible, as we should be able to avoid ever looking at the keys for a static ctrl array that's all EMPTY. The BuildHasher is the most annoying part here I think. I think it has to be inline. But that's fine for everyone using FxHasher or any other BuildHasherDefault, which is zero-sized. Sadly RandomState (std's default) needs 16 bytes for the seed.

Gankra avatar Apr 26 '19 17:04 Gankra

As an aside this would be an exciting experiment because we would have a very optimized implementation to compare the performance difference of this "trick" to in an apples-to-apples way. I'm not aware of any detailed analysis of the second and third-order effects of this optimization, like cache locality and whether it upsets the optimizer, which have historically been reasons to hand-wring over it.

Gankra avatar Apr 26 '19 17:04 Gankra