thanks_for_posts
thanks_for_posts copied to clipboard
Apply an exponential easing function to ratings for more even distribution
Fix for the problem I highlighted here: https://www.phpbb.com/customise/db/extension/thanks_for_posts_2/support/topic/236261
One major problem in how ratings are calculated is that they tend to obey Benford's Law, giving exponentially-skewed distribution.
For example, in a typical forum, there might be just one post with 100 thanks, with a few hovering around the 80-90 mark, and the vast majority having just a few. In this hypothetical forum, almost all posts would have ratings close to zero, implying they're bad (or at least not particularly valuable). If less than 2% of posts gained more than 5 thanks, a post with 5 thanks would already be in the 98th percentile of outstanding posts, yet its rating would show only 5%, due to being ranked against that 100-thank post instead of the vast majority of its peers!
The most accurate way of fixing this problem would be to rate posts by their percentile; however, this would massively complicate the calculation logic and probably impact performance a lot, as every single new post would affect the rating of every single other post.
A much simpler solution would be simply applying an exponential easing function to the current ratings to adjust them. This would counteract the exponential effect from Benford's Law and give a much more even distribution.
Test suite is failing on the MSSQL 2017 step, presumably for a reason unrelated to the PR as my code changes purely affect formatting and don't touch anything database-related.
Just to clarify (x
is a value of $row['post_thanks'] / ($max_post_thanks)
; y
is resulting post rating in %
) .
Current rating distribution (y(x)=100*x) | New rating distribution (y(x)=(1−2^(-10x))*100) |
|
|
@rxu
So post with 1/6 thanks count of mostly thanked one will get rating of ~66%, post with 1/3 count of mostly thanked one will get rating of ~90%.
Yes, that's correct. The reasoning is that top-rated posts already tend to follow an exponential distribution: for example, in a forum where "100%" (the top rated post) has 100 thanks, getting the number of thanks for a random sample of 10 posts will typically look something like "2, 0, 0, 1, 6, 11, 2, 5, 4, 0" rather than "94, 27, 38, 90, 73, 6, 18, 46, 62, 13".
As a result, the current ratings tend to be almost universally very low, making it look like the vast majority of posts are of "low quality".
Easing the results exponentially counteracts this, approximating roughly the distribution you'd expect if ratings were percent_iles_ rather than percent_ages_ of the top post (using the actual percentile would likely overcomplicate things, as doing so in a performant way would probably require lots of caching).
I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow, or maybe even revisiting the percentile idea if I can think of a way to simplify it — thoughts?
I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow
Makes sense, I guess having some switch or an option to select a distribution strategy (probably even more than amongst just 2) would be great.