java-consistent-hashing-algorithms
java-consistent-hashing-algorithms copied to clipboard
Unexpected computational complexity
Hi!
I am playing around with MementoHash and investigating the average runtime when most buckets are removed. As an extreme case, I started with 100k buckets and randomly removed all except one bucket. For this case, the number of nested loop iterations in my experiment contradicts Proposition 18 in the paper.
The experiment can be reproduced with this unit test which fails here where the measured average number of iterations differs by more than 100 standard deviations from the expected number of iterations according to Proposition 18. Do you have any explanation for this discrepancy?
Best regards Otmar
Dear Otmar,
Thank you so much for your active contribution to our project.
The numbering of propositions in our paper is section-based, so there is no Proposition 18. Could you please let us know which proposition you're referring to in the version published on arXiv?
Best regards, Massimo
I refer to Prop. 18 (last paragraph of page 10) in the published version (https://doi.org/10.1109/TNET.2024.3393476).
Best regards Otmar
Dear Otmar,
I’ve analyzed your test with respect to Proposition 18, and it appears that the theoretical boundary is indeed exceeded in practice when the number of removed nodes surpasses 99.95%. We didn’t detect this inconsistency earlier because our tests only went up to 99% node removal.
By tracking the iterations of the outer loop, it becomes clear that the issue originates in the inner loop. This can be attributed to the fact that removing nodes causes the replacement chains to grow increasingly long. When nearly all nodes are removed, these chains can extend to become a single replacement chain covering all the failed nodes.
Since the inner loop iterates over a replacement chain, it's expected that the average number of iterations increases substantially as the number of removed nodes approaches the total.
We designed the algorithm to perform efficiently in stable production environments where failures are rare. Although we tested up to 99% node removal, our intended use case involves failure rates below 20%.
MementoHash is therefore not suitable for highly dynamic environments like peer-to-peer networks, where nodes frequently and randomly join and leave.
Best regards, Massimo
Dear Massimo,
Thanks for your analysis and confirmation! Interestingly, we did not see a similar behavior for the memory-optimized version of AnchorHash when removing almost all buckets in random order. Only very specific bucket removal orders lead to a linear run-time. However, just by chance, buckets are unlikely to become unavailable in such an adversarial order. Therefore, AnchorHash's behavior seems to be better for such extreme cases.
This was the main reason that we decided to implement a variant of AnchorHash for Hash4j v0.22.0, but combining it with the idea of MementoHash to avoid defining the maximum number of buckets upfront and with JumpBackHash. The nice thing is that the random generator already used for JumpBackHash can be continued to be used for the AnchorHash part instead of additional hash computations.
This unit test shows the significantly better expected runtime when removing almost all nodes in random order, while this unit test demonstrates the worst case runtime for an adversarial bucket removal order.
Thank you and best regards, Otmar
Dear Otmar,
I’ve pushed this variant of Memento, named Memento9995, which preserves the theoretical constraints even beyond the 99.95% threshold of removed nodes. I’ve also included your test case, so you can verify it directly.
This version uses about 20% more memory. For this reason, we ultimately decided not to make it the official release. While more memory may imply greater resource usage, it also leads to increased cache misses, which can significantly degrade performance. In fact, even algorithms that are asymptotically faster can be slower in practice due to such cache inefficiencies (which is exactly the case of anchorhash). Therefore, we prioritized memory efficiency for general use cases over maintaining performance in a rare edge case.
Furthermore, Memento is designed with a specific usage pattern in mind: scaling the cluster by adding and removing nodes in LIFO order. Its support for arbitrary removals is mainly intended for failure handling. When used this way the data structure and lookup time remain small (nearly constant time if used in combination with JumpBackHash).
From what I see, your library is shaping up to be a comprehensive solution for various hashing use cases. Please correct me if I’m wrong, but it seems you’re taking an approach that prioritizes performance in scenarios where most nodes have been removed (opposite to ours, which favors performance when most nodes are active). I’d be very interested to understand more about your target use cases.
Thanks and best regards, Massimo
Dear Massimo!
Thank you! Yes, Memento9995 behaves as theoretically expected. The logic of the bucket lookup seems very similar to that of the memory-optimized version of AnchorHash, which we used as the basis of our Hash4j implementation. While AnchorHash stores the information in arrays that are sparse if the fraction of removed buckets is low, MementoHash stores the same information in a map.
In Hash4j, we decided to use the array approach as memory is not really an issue. In the worst case, you need to store a long[] array with a length equal to the historic maximum number of buckets. In practice, you also need to store additional information that associates the logical bucket IDs with the actual resources. The memory requirements of this additional mapping scale linearly with the number of buckets. Hence, if you can afford to keep this additional mapping in memory when the number of buckets equals the historic maximum, then it should not be a big problem to also keep a long[] array of that size in memory.
You're right that cache misses could make the array approach slower. This is why we packed the information of the 2 integer arrays A and K of AnchorHash into a single long[] array. We might test the performance of our array-based implementation against a map-based one in the future. However, I don't expect a big improvement in the lookup time for less than 1000 buckets, in particular, if the time needed to determine the actual resource from the logical bucket ID is also incorporated. In any case, if we see a benefit of using the map approach, we still have the option to change the implementation without a breaking change later, as it is just a different way to store the same information.
Best regards, Otmar