dragonfly
dragonfly copied to clipboard
fix(zset): fix random in ZRANDMEMBER command
Fixes #2850
The implementation of the ZRANDMEMBER
command has two cases:
- In the first case, the command is executed in
O(n * log(m))
, wheren
is the number of random elements,m
is the size of the sorted set. In this case,n
indexes are generated using the Robert Floyd's sampling algorithm and then obtained from the set. - In the second case (if n * log(m) is greater than m), the command is executed in
O(m)
. In this case, the algorithm iterates through the entire sorted set
More about the Robert Floyd's sampling algorithm:
Hi @BagritsevichStepan - thank you very much for this PR. I understand that you are aware of the tradeoffs - you specified them in the PR description. You decided to implement a high quality random selection, to preserve the uniformity properties of the algorithm but have more O(n logM)
runtime complexity and O(n)
space complexity.
Can you explain why do you think a naive " Simple random sampling with replacement" (SRSWR) algorithm won't work here?
Hi @BagritsevichStepan - thank you very much for this PR. I understand that you are aware of the tradeoffs - you specified them in the PR description. You decided to implement a high quality random selection, to preserve the uniformity properties of the algorithm but have more
O(n logM)
runtime complexity andO(n)
space complexity.Can you explain why do you think a naive " Simple random sampling with replacement" (SRSWR) algorithm won't work here?
Hi @romange,
Here's how random indexes generation works:
1. If count < 0
(repeating elements are possible), then the naive SRSWR algorithm you mentioned is used. But I noticed that in this case I am creating an additional array of n
elements (n
is the number of members returned). I fixed this in the last commit, so no more additional memory is used.
2. If count >= 0
(no repeated elements are returned), then the Robert Floyd's sampling algorithm is used. In this case, I generate n
random indexes in O(n)
. The space complexity is O(n)
.
Then the algorithm finds the elements from the sorted set by the generated indexes. There are also two options here:
- In the first case, the algorithm gets the elements from the sorted set by making a query for each generated index. The time complexity of each query is
O(log m)
(m
is the number of elements in the sorted set). In totalO(n * log m)
. - In the second case, the algorithm gets all the elements from the sorted set as an array. And then it gets each element by index from that array. In this case, time complexity is
O(m)
.
If I'm not mistaken, the current implementation of HRANDFIELD command uses only the second case and thus has O(m)
time complexity. However, in some cases O(n * log(m))
can be faster.
Note: The HRANDFIELD
is similar to ZRANDMEMBER
, so I used its implementation as an example.
I fixed the code review comments. @adiholden @dranikpg please take a second look, thank you!
@BagritsevichStepan the code looks good but I think there is a problem with a rebase you did as I see changes in this PR from another PR, please check this