dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

fix(zset): fix random in ZRANDMEMBER command

Open BagritsevichStepan opened this issue 9 months ago • 3 comments

Fixes #2850

The implementation of the ZRANDMEMBER command has two cases:

  1. In the first case, the command is executed in O(n * log(m)), where n is the number of random elements, m is the size of the sorted set. In this case, n indexes are generated using the Robert Floyd's sampling algorithm and then obtained from the set.
  2. In the second case (if n * log(m) is greater than m), the command is executed in O(m). In this case, the algorithm iterates through the entire sorted set

More about the Robert Floyd's sampling algorithm:

BagritsevichStepan avatar May 01 '24 14:05 BagritsevichStepan

Hi @BagritsevichStepan - thank you very much for this PR. I understand that you are aware of the tradeoffs - you specified them in the PR description. You decided to implement a high quality random selection, to preserve the uniformity properties of the algorithm but have more O(n logM) runtime complexity and O(n) space complexity.

Can you explain why do you think a naive " Simple random sampling with replacement" (SRSWR) algorithm won't work here?

romange avatar May 01 '24 18:05 romange

Hi @BagritsevichStepan - thank you very much for this PR. I understand that you are aware of the tradeoffs - you specified them in the PR description. You decided to implement a high quality random selection, to preserve the uniformity properties of the algorithm but have more O(n logM) runtime complexity and O(n) space complexity.

Can you explain why do you think a naive " Simple random sampling with replacement" (SRSWR) algorithm won't work here?

Hi @romange, Here's how random indexes generation works: 1. If count < 0 (repeating elements are possible), then the naive SRSWR algorithm you mentioned is used. But I noticed that in this case I am creating an additional array of n elements (n is the number of members returned). I fixed this in the last commit, so no more additional memory is used. 2. If count >= 0 (no repeated elements are returned), then the Robert Floyd's sampling algorithm is used. In this case, I generate n random indexes in O(n). The space complexity is O(n).

Then the algorithm finds the elements from the sorted set by the generated indexes. There are also two options here:

  1. In the first case, the algorithm gets the elements from the sorted set by making a query for each generated index. The time complexity of each query is O(log m) (m is the number of elements in the sorted set). In total O(n * log m).
  2. In the second case, the algorithm gets all the elements from the sorted set as an array. And then it gets each element by index from that array. In this case, time complexity is O(m).

If I'm not mistaken, the current implementation of HRANDFIELD command uses only the second case and thus has O(m) time complexity. However, in some cases O(n * log(m)) can be faster.

Note: The HRANDFIELD is similar to ZRANDMEMBER, so I used its implementation as an example.

BagritsevichStepan avatar May 02 '24 15:05 BagritsevichStepan

I fixed the code review comments. @adiholden @dranikpg please take a second look, thank you!

BagritsevichStepan avatar May 05 '24 13:05 BagritsevichStepan

@BagritsevichStepan the code looks good but I think there is a problem with a rebase you did as I see changes in this PR from another PR, please check this

adiholden avatar May 05 '24 18:05 adiholden