t-digest icon indicating copy to clipboard operation
t-digest copied to clipboard

big difference in ArrayDigest vs AVLTreeDigest quantile result

Open xysun opened this issue 8 years ago • 5 comments

Hi,

I'm running a very simple simulation and surprisingly found some big difference in quantile result from ArrayDigest and AVLTreeDigest. AVLTreeDigest is giving less accurate results.

Simulation setup:

  1. send 20k measurement, where each measurement has 0.1% chance of being 100, and 99.9% chance of being 5
  2. create a ArrayDigest or AVLTreeDigest with compression = 15
  3. Expected result should be: 95-99 quantile = 5; 99.9 quantile = 100

Result (from one simulation run, but each run is very close to each other):

ArrayDigest:

0.95, quantile value 5.0
0.96, quantile value 5.0
0.97, quantile value 5.0
0.98, quantile value 5.0
0.99, quantile value 5.0
0.999, quantile value 100.0

AVLTreeDigest:

0.95, quantile value 5.0
0.96, quantile value 5.0
0.97, quantile value 5.0
0.98, quantile value 8.790523690773068
0.99, quantile value 56.1720698254364
0.999, quantile value 98.81546134663341

Is this expected?

Code can be found here

xysun avatar May 26 '17 04:05 xysun

Very good observation. What you are seeing is a serious bug that is related to the handling of duplicate values in the AVLTreeDigest.

The ArrayDigest is now deprecated, but the good news is that the new MergingDigest behaves correctly. Here is the new test, similar to what you have, but a bit more extreme:

  public void testSingletonInACrowd() {
        final double compression = 100;
        TDigest dist = factory(compression).create();
        for (int i = 0; i < 10000; i++) {
            dist.add(10);
        }
        dist.add(20);
        dist.compress();
        assertEquals(10.0, dist.quantile(0), 0);
        assertEquals(10.0, dist.quantile(0.5), 0);
        assertEquals(10.0, dist.quantile(0.8), 0);
        assertEquals(10.0, dist.quantile(0.9), 0);
        assertEquals(10.0, dist.quantile(0.99), 0);
        assertEquals(10.0, dist.quantile(0.999), 0);
        assertEquals(20.0, dist.quantile(1), 0);
    }

I think I know a way to make the AVLDigest do the right thing. As I see it, this is a blocking bug for the 3.0 release. The work-around is to use the MergingDigest.

tdunning avatar May 28 '17 17:05 tdunning

Thanks! This is very helpful.

xysun avatar May 29 '17 07:05 xysun

I am going to push this to after the 3.2 release so that 3.2 can go out.

tdunning avatar Aug 06 '17 20:08 tdunning

The new interpolation methods look like they may fix this issue.

tdunning avatar Jan 19 '19 07:01 tdunning

This is half fixed, but I won't finish checking that the fix is complete until after this 3.3 release.

tdunning avatar Apr 02 '21 04:04 tdunning