htop icon indicating copy to clipboard operation
htop copied to clipboard

Hashtable: only shrink in Put and fix various other issues

Open charlievieth opened this issue 2 years ago • 4 comments

Changes:

  • Only resize the hash in Hashtable_put to reduce thrashing, otherwise we a large deletion results in multiple incremental resizes.
  • Reduce the load factor to 2/3 from 0.7 since testing shows the average and median probe lengths increases quickly after 2/3 of capacity is used.
  • Eliminate modulo math when incrementing the index in put, get, and remove.
  • Increase the number of available prime sizes.
  • Define the hash key type as uint32_t instead of "unsigned int", which is stable across platforms and large enough to hold any PID.
  • Change Hashtable_get to take a "const" Hashtable.

This was originally meant to fix an issue with the Hashtable shrink factor that was fixed with 230dc9c3.

charlievieth avatar May 06 '22 03:05 charlievieth

Actually trying to run the code produces an infinite loop in Hashtable_put because the resizing doesn't trigger. Can you have another look?

BenBE avatar May 06 '22 05:05 BenBE

@BenBE Let's hold off on these changes for a moment. I was just reviewing the implications of only shrinking on Hashtable_put() and one of the side-effects is that a Hashtable created with an explicit size greater than MIN_TABLE_SIZE will shrink on the first call to Hashtable_put, which we obviously don't want.

The downside of moving the shrink logic back to Hashtable_remove is that when turnover is high we'd end up shrinking the table on removes only to grow it again on puts (basically doublind the work we have to do).

Have we considered changing the hash table to not automatically shrink? This would allow for adding something like Hashtable_reserve(N) (to reserve space for N items before inserts) and we could use the same compact logic that we use for Vectors (which we never shrink)?

charlievieth avatar May 08 '22 00:05 charlievieth

Couldn't you achieve this effect by setting the shrink threshold very low? The performance of the hash table won't be hurt if it is "too empty", thus down-sizing it late should have no negative impact.

As per an actual Hashtable_reserve call: I'd appreciate such a function but this is likely best left as a separate PR as the involved code changes will be quite substantial.

BenBE avatar May 08 '22 07:05 BenBE

The downside of moving the shrink logic back to Hashtable_remove is that when turnover is high we'd end up shrinking the table on removes only to grow it again on puts (basically doublind the work we have to do).

What are the cases for high turnover in htop - fighting off a fork bomb manually?

Furthermore, this likely won't be a problem for the current use in ProcessList because ProcessList_scan first adds all new processes and after that removes the expired ones, so in stable system state it would go n -> 2n -> n in worst case. Sure, we can still have resizes with highly variable process count, but that would trigger them in the Hashtable_put variant too.

tanriol avatar May 08 '22 09:05 tanriol