tdigest
tdigest copied to clipboard
Fix weights
If I'm not mistaken the weights are in wrong order.
Hi @JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786
Can you detail why you think they should be switched?
Hey, From the code we have the next variables: ci - the current bucket ci_plus_1 - the next bucket "k = (c_i_plus_one.count + c_i.count) / 2." which mean that k is the "step" and distance between the centers of the buckets From last loop we have t which is the total distance seen so far "t += k" "z1 = p - t" means the distance from the center ci "z2 = t + k - p" means the distance from the center of ci_plus_1 Therefore, the weighted arithmetic mean here is wrong "(c_i.mean * z2 + c_i_plus_one.mean * z1) / (z1 + z2)" The distance to ci_plus_1 is multiplied by c_i_plus_one.mean and vice versa. My fix was to change z1 to z2 and z2 to z1 in the weighted arithmetic mean.
בתאריך יום ד׳, 1 באפר׳ 2020 ב-3:09 מאת Cameron Davidson-Pilon < [email protected]>:
Hi @JonathanZailer https://github.com/JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786
Can you detail why you think they should be switched?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/tdigest/pull/52#issuecomment-606952435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4W257UWGJXBNMFSVIPSADRKKA35ANCNFSM4LXTXV7Q .
Hey,
I know why the test failed. I'll try to see if I can add another fix to the code.(commit) FYI- I didn't want to divide into the test but I was curious why it failed :)
Good day,
Jonathan
בתאריך יום ד׳, 1 באפר׳ 2020 ב-11:48 מאת jonathan zailer < [email protected]>:
Hey, From the code we have the next variables: ci - the current bucket ci_plus_1 - the next bucket "k = (c_i_plus_one.count + c_i.count) / 2." which mean that k is the "step" and distance between the centers of the buckets From last loop we have t which is the total distance seen so far "t += k" "z1 = p - t" means the distance from the center ci "z2 = t + k - p" means the distance from the center of ci_plus_1 Therefore, the weighted arithmetic mean here is wrong "(c_i.mean * z2 + c_i_plus_one.mean * z1) / (z1 + z2)" The distance to ci_plus_1 is multiplied by c_i_plus_one.mean and vice versa. My fix was to change z1 to z2 and z2 to z1 in the weighted arithmetic mean.
בתאריך יום ד׳, 1 באפר׳ 2020 ב-3:09 מאת Cameron Davidson-Pilon < [email protected]>:
Hi @JonathanZailer https://github.com/JonathanZailer, there is a failing test related to the percentile change: https://travis-ci.org/github/CamDavidsonPilon/tdigest/jobs/669282786
Can you detail why you think they should be switched?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/tdigest/pull/52#issuecomment-606952435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4W257UWGJXBNMFSVIPSADRKKA35ANCNFSM4LXTXV7Q .