Reservoir sampling probability inaccuracy
Please check the FAQ documentation before raising an issue
Describe the bug (required)
Your Environments (required)
- OS:
Linux 8cef9170aed4 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux - Compiler:
g++ (Nebula Graph Build) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
How To Reproduce(required)
Steps to reproduce the behavior:
#include "common/algorithm/ReservoirSampling.h" ...... for (size_t time = 0; time < 1024; time++) { ReservoirSampling<int64_t> sampler(1); sampler.sampling(1); sampler.sampling(2); auto result = sampler.samples(); //The sampling result only contains the number 2, not the number 1. std::cout << "sample result: " <<result[0] << std::endl; }
Expected behavior
The sampling result only contains the number 1 and the number 2.
Additional context
The <go_statement> SAMPLE <sample_list> statement employs the reservoir sampling algorithm, but the current implementation exhibits probability inaccuracies.