Explore ways to make `HSCAN` command execute with O(1) complexity.
The current HSCAN implementation executes with O(N) complexity, which is different from Redis that operates with O(1) complexity.
We should explore ways to optimize our implementation.
@JyotinderSingh I would like to work on this can you please assign this to me
@JyotinderSingh would like to work on it. Please assign it to me
@JyotinderSingh . I can contribute to it
@JyotinderSingh most of the issues assigned to me are in review state. I can take this one up next
@raghavbabbar assigned, thanks for contributing.
@raghavbabbar Are you working on this? If not, I can take this up.
Going through SCAN implementation of Redis
@SyedMa3 i am working on it. Will update soon here
Hello @raghavbabbar,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
Will update here soon @arpitbbhayani in 2 or 3 days. thanks
Hi @JyotinderSingh @arpitbbhayani
I’ve explored the Redis codebase to understand how HSCAN works, especially in handling large hash maps without compromising performance. Here’s a summary of Redis’s approach and how it can help us optimize our current implementation.
Current Problem in Our Code
-
Sorting Keys with Each HSCAN Call Currently, in every HSCAN call, we sort the keys of the map to maintain consistency with the cursor (Code reference).
-
Increased Latency Sorting incurs significant latency, especially when there are many keys.
How Redis Solves This Problem
-
Custom Hash Map Implementation Redis uses a custom hash map implementation that doesn’t require sorting for each scan, which helps reduce latency.
-
Bit-Reversal Technique Redis leverages the bit-reversal technique to iterate over its hash map in a non-sequential order, achieving consistency without sorting.
What is Bit-Reversal?
The bit-reversal technique provides a way to traverse a hash map non-sequentially. Rather than following a linear scan, bit-reversal allows the cursor to cover different sections of the hash table. This minimizes the impact of changes in the table structure, especially useful during rehashing.
Example of Bit-Reversal-Based Cursor Traversal Suppose we start with a cursor at position 0 and a hash map size of 8. The cursor traversal pattern would look like this (with bit-reversal and increment operations): 0 → 4 → 2 → 6 → 1 → 5 → 3 → 7 → 0
How the Bit-Reversal Works
- Reverse all the bits of the current cursor.
- Increment the cursor by one.
- Reverse the bits again.
This approach ensures we visit each position in a non-sequential order, which helps avoid revisiting entries unnecessarily and is more adaptable when the table resizes.
Rehashing and Cursor Behavior
When the hash table’s size is increased (doubled), all elements can potentially change positions. Redis hash tables always have sizes that are powers of two, which simplifies calculations when resizing.
Example of Rehashing with Bit-Reversal
- Original Hash Table (Size = 4)
Let’s assume some element hashes:
HASH('A') = 0
HASH('B') = 6
HASH('D') = 3
The positions in the hash table are based on HASH(element) % size_of_table. In a table of size 4, we get:
00 → 'A' 01 → 'B' 10 → NULL 11 → 'D'
2. After Rehashing (Size = 8)
Upon resizing, the hash table doubles to 8, and elements now map to positions by their new mask:
```
000 → 'A'
001 → NULL
010 → NULL
011 → 'D'
100 → NULL
101 → NULL
110 → 'B'
111 → NULL
Here, element B moved from position 01 to 110.
- If we observe above the position of ‘B’ is changed in new Hash table So an element if is at position ‘XX’, after rehashing it can go it either ‘0XX’ (same) or ‘1XX’. So it’s possible in case of HSCAN to return duplicate entries.
Key Takeaways
- No Sorting Needed: Bit-reversal traversal eliminates the need for sorting keys with each scan, reducing latency.
- Possible Duplicates: HSCAN may return duplicate entries due to the non-linear traversal and rehashing behavior.
- Missed Entries: If an element is inserted into a previously visited position during a scan, it may not appear in the ongoing scan.
- Custom Hash Map Needed: To replicate Redis’s efficiency, implementing a custom hash
We can discuss further.
Thanks for looking into this @raghavbabbar. Implementing a custom hashmap is a big undertaking, and it will be hard to compete with the built-in map's performance characteristics. This will require a very large design review before we can proceed.
For now, it seems like this isn't something we can prioritise given all the other high priority tasks that remain. Maybe we can pick this up later when we have more bandwidth.
Thanks for looking into this and summarising it in such detailed manner for everyone. I agree with @JyotinderSingh on parking this for now. However this will really aid me in thinking my design for a new hashmap with expiry. While that would not be a customised version, I would be designing it in way that future refactors wouldn't need changes into eval functions.