dice Explore ways to make `HSCAN` command execute with O(1) complexity.

The current HSCAN implementation executes with O(N) complexity, which is different from Redis that operates with O(1) complexity.

We should explore ways to optimize our implementation.

Oct 06 '24 18:10 JyotinderSingh

@JyotinderSingh I would like to work on this can you please assign this to me

Oct 06 '24 18:10 raghavbabbar

@JyotinderSingh would like to work on it. Please assign it to me

Oct 06 '24 19:10 swarajrb7

@JyotinderSingh . I can contribute to it

Oct 06 '24 22:10 N3XT14

@JyotinderSingh most of the issues assigned to me are in review state. I can take this one up next

Oct 07 '24 02:10 ayushsatyam146

@raghavbabbar assigned, thanks for contributing.

Oct 07 '24 05:10 lucifercr07

@raghavbabbar Are you working on this? If not, I can take this up.

Going through SCAN implementation of Redis

Oct 19 '24 07:10 SyedMa3

@SyedMa3 i am working on it. Will update soon here

Oct 19 '24 08:10 raghavbabbar097

Hello @raghavbabbar,

There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

Oct 24 '24 13:10 arpitbbhayani

Will update here soon @arpitbbhayani in 2 or 3 days. thanks

Oct 26 '24 10:10 raghavbabbar

Hi @JyotinderSingh @arpitbbhayani

I’ve explored the Redis codebase to understand how HSCAN works, especially in handling large hash maps without compromising performance. Here’s a summary of Redis’s approach and how it can help us optimize our current implementation.

Current Problem in Our Code

Sorting Keys with Each HSCAN Call Currently, in every HSCAN call, we sort the keys of the map to maintain consistency with the cursor (Code reference).
Increased Latency Sorting incurs significant latency, especially when there are many keys.

How Redis Solves This Problem

Custom Hash Map Implementation Redis uses a custom hash map implementation that doesn’t require sorting for each scan, which helps reduce latency.
Bit-Reversal Technique Redis leverages the bit-reversal technique to iterate over its hash map in a non-sequential order, achieving consistency without sorting.

What is Bit-Reversal?

The bit-reversal technique provides a way to traverse a hash map non-sequentially. Rather than following a linear scan, bit-reversal allows the cursor to cover different sections of the hash table. This minimizes the impact of changes in the table structure, especially useful during rehashing.

Example of Bit-Reversal-Based Cursor Traversal Suppose we start with a cursor at position 0 and a hash map size of 8. The cursor traversal pattern would look like this (with bit-reversal and increment operations): 0 → 4 → 2 → 6 → 1 → 5 → 3 → 7 → 0

How the Bit-Reversal Works

Reverse all the bits of the current cursor.
Increment the cursor by one.
Reverse the bits again.

This approach ensures we visit each position in a non-sequential order, which helps avoid revisiting entries unnecessarily and is more adaptable when the table resizes.

Rehashing and Cursor Behavior

When the hash table’s size is increased (doubled), all elements can potentially change positions. Redis hash tables always have sizes that are powers of two, which simplifies calculations when resizing.

Example of Rehashing with Bit-Reversal

Original Hash Table (Size = 4) Let’s assume some element hashes: HASH('A') = 0 HASH('B') = 6 HASH('D') = 3 The positions in the hash table are based on HASH(element) % size_of_table. In a table of size 4, we get:
```
   00 → 'A'
   01 → 'B'
   10 → NULL
   11 → 'D'
```

2. After Rehashing (Size = 8)
Upon resizing, the hash table doubles to 8, and elements now map to positions by their new mask:
    ```
    000 → 'A'
    001 → NULL
    010 → NULL
    011 → 'D'
    100 → NULL
    101 → NULL
    110 → 'B'
    111 → NULL

Here, element B moved from position 01 to 110.

If we observe above the position of ‘B’ is changed in new Hash table So an element if is at position ‘XX’, after rehashing it can go it either ‘0XX’ (same) or ‘1XX’. So it’s possible in case of HSCAN to return duplicate entries.

Key Takeaways

No Sorting Needed: Bit-reversal traversal eliminates the need for sorting keys with each scan, reducing latency.
Possible Duplicates: HSCAN may return duplicate entries due to the non-linear traversal and rehashing behavior.
Missed Entries: If an element is inserted into a previously visited position during a scan, it may not appear in the ongoing scan.
Custom Hash Map Needed: To replicate Redis’s efficiency, implementing a custom hash

We can discuss further.

Nov 02 '24 20:11 raghavbabbar

Thanks for looking into this @raghavbabbar. Implementing a custom hashmap is a big undertaking, and it will be hard to compete with the built-in map's performance characteristics. This will require a very large design review before we can proceed.

For now, it seems like this isn't something we can prioritise given all the other high priority tasks that remain. Maybe we can pick this up later when we have more bandwidth.

Nov 03 '24 02:11 JyotinderSingh

Thanks for looking into this and summarising it in such detailed manner for everyone. I agree with @JyotinderSingh on parking this for now. However this will really aid me in thinking my design for a new hashmap with expiry. While that would not be a customised version, I would be designing it in way that future refactors wouldn't need changes into eval functions.

Nov 03 '24 05:11 apoorvyadav1111