hashbrown icon indicating copy to clipboard operation
hashbrown copied to clipboard

Add `RawTable::vacuum` to clean up DELETED entries

Open cuviper opened this issue 4 years ago • 7 comments

This cleans the table to ensure the maximum usable capacity in its current allocation, rehashing items that need to move around previously-deleted entries.

cuviper avatar Apr 05 '21 17:04 cuviper

This was inspired by bluss/indexmap#183, and the name by a database VACUUM, like PostgreSQL or SQLite. If there's interest, I could also add methods to HashMap and HashSet.

cuviper avatar Apr 05 '21 17:04 cuviper

I have doubts about how useful this will be in practice. In almost all cases reserve_rehash will do the right thing and call rehash_in_place instead of growing the table. The heuristic in reserve_rehash will grow the table if it is more than half full, which avoids situations where you end up calling rehash_in_place on every insert if the table is at capacity and you are constantly removing & adding one item.

Amanieu avatar Apr 05 '21 18:04 Amanieu

I'm trying to provide an option for the concern in https://github.com/bluss/indexmap/issues/183#issuecomment-809614807:

OK, so I guess that means I shouldn't use IndexMap in a scenario where I don't want to trigger memory allocations after the initial creation of the map?

In that scenario, I suppose they might be working more than half full, where reserve_rehash would want to grow. I'm not proposing to automatically vacuum, because the growth heuristic is probably better in general, but it may still be useful in memory-constrained environments.

cuviper avatar Apr 05 '21 18:04 cuviper

You could just reserve 2x the needed capacity, which ensures that reserve_rehash will always use the rehash_in_place path.

Or just don't do anything: in the worst case you get a single reallocation, after which the rehash_in_place path is always used.

Amanieu avatar Apr 05 '21 18:04 Amanieu

To provide a bit more context, the memory constrained environment I'm working in is an audio library. Generally people working with audio don't want to allocate memory on the audio thread, because if the operating system takes a long time to allocate memory, it could result in audio stuttering, which is very unpleasant. So once I create a hash map, I never want to allocate memory for it again.

It may just be that I need a more specialized data structure.

tesselode avatar Apr 06 '21 06:04 tesselode

As I've said before, hashbrown will automatically clean up deleted entries when it runs out of capacity so it won't call out to the allocator to grow the table unless you actually need it.

Amanieu avatar Apr 06 '21 06:04 Amanieu

:umbrella: The latest upstream changes (presumably #458) made this pull request unmergeable. Please resolve the merge conflicts.

bors avatar Aug 24 '23 20:08 bors