tdigest icon indicating copy to clipboard operation
tdigest copied to clipboard

Fix weights

Open JonathanZailer opened this issue 4 years ago • 3 comments

If I'm not mistaken the weights are in wrong order.

JonathanZailer avatar Mar 31 '20 14:03 JonathanZailer

Hi @JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786

Can you detail why you think they should be switched?

CamDavidsonPilon avatar Apr 01 '20 00:04 CamDavidsonPilon

Hey, From the code we have the next variables: ci - the current bucket ci_plus_1 - the next bucket "k = (c_i_plus_one.count + c_i.count) / 2." which mean that k is the "step" and distance between the centers of the buckets From last loop we have t which is the total distance seen so far "t += k" "z1 = p - t" means the distance from the center ci "z2 = t + k - p" means the distance from the center of ci_plus_1 Therefore, the weighted arithmetic mean here is wrong "(c_i.mean * z2 + c_i_plus_one.mean * z1) / (z1 + z2)" The distance to ci_plus_1 is multiplied by c_i_plus_one.mean and vice versa. My fix was to change z1 to z2 and z2 to z1 in the weighted arithmetic mean.

‫בתאריך יום ד׳, 1 באפר׳ 2020 ב-3:09 מאת ‪Cameron Davidson-Pilon‬‏ <‪ [email protected]‬‏>:‬

Hi @JonathanZailer https://github.com/JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786

Can you detail why you think they should be switched?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/tdigest/pull/52#issuecomment-606952435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4W257UWGJXBNMFSVIPSADRKKA35ANCNFSM4LXTXV7Q .

JonathanZailer avatar Apr 01 '20 08:04 JonathanZailer

Hey,

I know why the test failed. I'll try to see if I can add another fix to the code.(commit) FYI- I didn't want to divide into the test but I was curious why it failed :)

Good day,

Jonathan

‫בתאריך יום ד׳, 1 באפר׳ 2020 ב-11:48 מאת ‪jonathan zailer‬‏ <‪ [email protected]‬‏>:‬

Hey, From the code we have the next variables: ci - the current bucket ci_plus_1 - the next bucket "k = (c_i_plus_one.count + c_i.count) / 2." which mean that k is the "step" and distance between the centers of the buckets From last loop we have t which is the total distance seen so far "t += k" "z1 = p - t" means the distance from the center ci "z2 = t + k - p" means the distance from the center of ci_plus_1 Therefore, the weighted arithmetic mean here is wrong "(c_i.mean * z2 + c_i_plus_one.mean * z1) / (z1 + z2)" The distance to ci_plus_1 is multiplied by c_i_plus_one.mean and vice versa. My fix was to change z1 to z2 and z2 to z1 in the weighted arithmetic mean.

‫בתאריך יום ד׳, 1 באפר׳ 2020 ב-3:09 מאת ‪Cameron Davidson-Pilon‬‏ <‪ [email protected]‬‏>:‬

Hi @JonathanZailer https://github.com/JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786

Can you detail why you think they should be switched?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/tdigest/pull/52#issuecomment-606952435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4W257UWGJXBNMFSVIPSADRKKA35ANCNFSM4LXTXV7Q .

JonathanZailer avatar Apr 01 '20 11:04 JonathanZailer