t-digest
t-digest copied to clipboard
Help users of t-digest upgrade
The following important projects are using older t-digest versions and should probably upgrade. An asterisk marks projects that have had a pull request or notification.
3.2:
- elastic: server
- atlassian bamboo core datavec api
- apache drill
- apache pinot core ML Dataset spotify metrics core apache servicemix apache nutch apache beam sdks java extensions sketching kixi stats apache druid extensions deeplearning4j modelexport solr
3.1: apache solr Apache Solr Content Extraction Library Apache Solr Language Identifier Apache Solr Prometheus Exporter Package apache mahout math apache kylin Kite Morphlines Metrics Scalable Brushfire Core (from stripe)
I am upgrading OpenSearch, https://github.com/opensearch-project/OpenSearch/pull/3634.
We have the following assertion in tests with 3.2:
final TDigestState state = new TDigestState(100);
Arrays.stream(values).forEach(state::add);
assertEquals(state.centroidCount(), values.length);
This is no longer the case with 3.3. @tdunning, could you please help me understand what changed? There are more failures after this, including very different percentiles returned with this upgrade.
I think that this test is flawed by having assumptions out of view.
The number of centroids was always defined as <= the number of samples inserted. That is the entire point of the sketch ... to not store everything.
What may have changed is that the limit on centroid count has been made a bit more strict.
I could comment more intelligently if I knew what was in values
Thanks @tdunning. I tagged you in https://github.com/opensearch-project/OpenSearch/pull/3634 that has source data from basic tests, and the diff in results from the upgrade. Could I please ask you to take a look? I certainly didn't expect such different numbers from an upgrade of a dot release of t-digest.