Update dependency com.tdunning:t-digest to v3.3
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| com.tdunning:t-digest | dependencies | minor | 3.1 -> 3.3 |
Release Notes
tdunning/t-digest (com.tdunning:t-digest)
v3.2
=========== In release 3.2, the goal is to produce an update to the code given the large number of improvements since the previous release.
There are a few bugs that will survive this release, most notably in the AVLTreeDigest. These have to do with large numbers of repeated data points and are not new bugs.
There is also a lot of work going on with serialization. I need to hear from people about what they are doing with serialization so that we can build some test cases to allow an appropriate migration strategy to future serialization.
The paper continues to be updated. The algorithmic descriptions are getting reasonably clear, but the speed and accuracy sections need a complete revamp with current implementations.
Bugs, fixed and known
Fixed
The following important issues are fixed in this release
Issue #90 Serialization for MergingDigest
Issue #92 Serialization for AVLTreeDigest
Maybe fixed
This issue has substantial progress, but lacks a definitive test to determine whether it should be closed.
Issue 78 Stability under merging.
Pushed
The following issues are pushed beyond this release
Issue #87 Future proof and extensible serialization
Issue #89 Bad handling for duplicate values in AVLTreeDigest
All fixed issues
Here is a complete list of issues resolved in this release:
Issue #55 Add time decay to t-digest
Issue #52 General factory method for "fromBytes"
Issue #90 Deserialization of MergingDigest BufferUnderflowException in 3.1
Issue #92 Error in AVLTreeDigest.fromBytes
Issue #93 high centroid frequency causes overflow - giving incorrect results
Issue #67 Release of version 3.2
Issue #81 AVLTreeDigest with a lot of datas : integer overflow
Issue #75 Adjusting the centroid threshold values to obtain better accuracy at interesting values
Issue #74 underlying distribution : powerlaw
Issue #72 Inverse quantile algorithm is non-contiguous
Issue #65 totalDigest add spark dataframe column / array
Issue #60 Getting IllegalArgumentException when adding digests
Issue #53 smallByteSize methods are very trappy in many classes -- should be changed or have warnings in javadocs
Issue #82 TDigest class does not implement Serializable interface in last release.
Issue #42 Histogram
Issue #40 Improved constraint on centroid sizes
Issue #37 Allow arbitrary scaling laws for centroid sizes
Issue #29 Test method testScaling() always adds values in ascending order
Issue #84 Remove deprecated kinds of t-digest
Issue #76 Add serializability
Issue #77 Question: Proof of bounds on merging digest size
Issue #71 Simple alternate algorithm using maxima, ranks and fixed cumulative weighting
Issue #61 Possible improvement to the speed of the algorithm
Issue #58 jdk8 doclint incompatibility
Issue #48 Build is unstable under some circumstances
Issue #63 Which TDigest do you recommend?
Issue #62 Very slow performance; what am I missing?
Issue #47 Make TDigest serializable
Issue #49 MergingDigest.centroids is wrong on an empty digest
Configuration
📅 Schedule: Branch creation - "* * * * *" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
- [ ] If you want to rebase/retry this PR, check this box
This PR has been generated by Renovate Bot
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!
So turns out the Streaming tests fail consistently with this upgrade. Have not dived into it, but there must be some incompatible behavior between these versions causing this. Anyone knows why?
t-digest 3.2 looks fine but 3.3 seems to have different stats assertions? The changes look to be around percentile.
The nitty gritty details look to be here:
https://github.com/tdunning/t-digest/issues/171
and
https://github.com/opensearch-project/OpenSearch/pull/3634
basically for small sample sizes there used to be a bug and now its fixed. That seems to be my rough interpretation so its not surprising that our tests are failing.
Pushed e0d36967626d1e261c172739974d534dc81ce219 updating the assertions in the percentiles for tests.
Edited/Blocked Notification
Renovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR.
You can manually request rebase by checking the rebase/retry box above.
⚠ Warning: custom changes will be lost.