solr icon indicating copy to clipboard operation
solr copied to clipboard

Update dependency com.tdunning:t-digest to v3.3

Open solrbot opened this issue 2 years ago • 1 comments

This PR contains the following updates:

Package Type Update Change
com.tdunning:t-digest dependencies minor 3.1 -> 3.3

Release Notes

tdunning/t-digest (com.tdunning:t-digest)

v3.2

=========== In release 3.2, the goal is to produce an update to the code given the large number of improvements since the previous release.

There are a few bugs that will survive this release, most notably in the AVLTreeDigest. These have to do with large numbers of repeated data points and are not new bugs.

There is also a lot of work going on with serialization. I need to hear from people about what they are doing with serialization so that we can build some test cases to allow an appropriate migration strategy to future serialization.

The paper continues to be updated. The algorithmic descriptions are getting reasonably clear, but the speed and accuracy sections need a complete revamp with current implementations.

Bugs, fixed and known

Fixed

The following important issues are fixed in this release

Issue #​90 Serialization for MergingDigest

Issue #​92 Serialization for AVLTreeDigest

Maybe fixed

This issue has substantial progress, but lacks a definitive test to determine whether it should be closed.

Issue 78 Stability under merging.

Pushed

The following issues are pushed beyond this release

Issue #​87 Future proof and extensible serialization

Issue #​89 Bad handling for duplicate values in AVLTreeDigest

All fixed issues

Here is a complete list of issues resolved in this release:

Issue #​55 Add time decay to t-digest

Issue #​52 General factory method for "fromBytes"

Issue #​90 Deserialization of MergingDigest BufferUnderflowException in 3.1

Issue #​92 Error in AVLTreeDigest.fromBytes

Issue #​93 high centroid frequency causes overflow - giving incorrect results

Issue #​67 Release of version 3.2

Issue #​81 AVLTreeDigest with a lot of datas : integer overflow

Issue #​75 Adjusting the centroid threshold values to obtain better accuracy at interesting values

Issue #​74 underlying distribution : powerlaw

Issue #​72 Inverse quantile algorithm is non-contiguous

Issue #​65 totalDigest add spark dataframe column / array

Issue #​60 Getting IllegalArgumentException when adding digests

Issue #​53 smallByteSize methods are very trappy in many classes -- should be changed or have warnings in javadocs

Issue #​82 TDigest class does not implement Serializable interface in last release.

Issue #​42 Histogram

Issue #​40 Improved constraint on centroid sizes

Issue #​37 Allow arbitrary scaling laws for centroid sizes

Issue #​29 Test method testScaling() always adds values in ascending order

Issue #​84 Remove deprecated kinds of t-digest

Issue #​76 Add serializability

Issue #​77 Question: Proof of bounds on merging digest size

Issue #​71 Simple alternate algorithm using maxima, ranks and fixed cumulative weighting

Issue #​61 Possible improvement to the speed of the algorithm

Issue #​58 jdk8 doclint incompatibility

Issue #​48 Build is unstable under some circumstances

Issue #​63 Which TDigest do you recommend?

Issue #​62 Very slow performance; what am I missing?

Issue #​47 Make TDigest serializable

Issue #​49 MergingDigest.centroids is wrong on an empty digest


Configuration

📅 Schedule: Branch creation - "* * * * *" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • [ ] If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot

solrbot avatar Dec 08 '23 14:12 solrbot

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Mar 07 '24 00:03 github-actions[bot]

So turns out the Streaming tests fail consistently with this upgrade. Have not dived into it, but there must be some incompatible behavior between these versions causing this. Anyone knows why?

janhoy avatar Mar 07 '24 11:03 janhoy

t-digest 3.2 looks fine but 3.3 seems to have different stats assertions? The changes look to be around percentile.

risdenk avatar Mar 08 '24 03:03 risdenk

The nitty gritty details look to be here:

https://github.com/tdunning/t-digest/issues/171

and

https://github.com/opensearch-project/OpenSearch/pull/3634

basically for small sample sizes there used to be a bug and now its fixed. That seems to be my rough interpretation so its not surprising that our tests are failing.

risdenk avatar Mar 08 '24 03:03 risdenk

Pushed e0d36967626d1e261c172739974d534dc81ce219 updating the assertions in the percentiles for tests.

risdenk avatar Mar 08 '24 03:03 risdenk

Edited/Blocked Notification

Renovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR.

You can manually request rebase by checking the rebase/retry box above.

Warning: custom changes will be lost.

solrbot avatar Mar 08 '24 04:03 solrbot