nebula icon indicating copy to clipboard operation
nebula copied to clipboard

Reservoir sampling probability inaccuracy

Open chzhoo opened this issue 4 months ago • 0 comments

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Your Environments (required)

  • OS: Linux 8cef9170aed4 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler: g++ (Nebula Graph Build) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

How To Reproduce(required)

Steps to reproduce the behavior:

#include "common/algorithm/ReservoirSampling.h" ...... for (size_t time = 0; time < 1024; time++) { ReservoirSampling<int64_t> sampler(1); sampler.sampling(1); sampler.sampling(2); auto result = sampler.samples(); //The sampling result only contains the number 2, not the number 1. std::cout << "sample result: " <<result[0] << std::endl; }

Expected behavior

The sampling result only contains the number 1 and the number 2.

Additional context

The <go_statement> SAMPLE <sample_list> statement employs the reservoir sampling algorithm, but the current implementation exhibits probability inaccuracies.

chzhoo avatar Aug 14 '25 08:08 chzhoo