thanks_for_posts icon indicating copy to clipboard operation
thanks_for_posts copied to clipboard

Apply an exponential easing function to ratings for more even distribution

Open lionel-rowe opened this issue 2 years ago • 4 comments

Fix for the problem I highlighted here: https://www.phpbb.com/customise/db/extension/thanks_for_posts_2/support/topic/236261

One major problem in how ratings are calculated is that they tend to obey Benford's Law, giving exponentially-skewed distribution.

For example, in a typical forum, there might be just one post with 100 thanks, with a few hovering around the 80-90 mark, and the vast majority having just a few. In this hypothetical forum, almost all posts would have ratings close to zero, implying they're bad (or at least not particularly valuable). If less than 2% of posts gained more than 5 thanks, a post with 5 thanks would already be in the 98th percentile of outstanding posts, yet its rating would show only 5%, due to being ranked against that 100-thank post instead of the vast majority of its peers!

The most accurate way of fixing this problem would be to rate posts by their percentile; however, this would massively complicate the calculation logic and probably impact performance a lot, as every single new post would affect the rating of every single other post.

A much simpler solution would be simply applying an exponential easing function to the current ratings to adjust them. This would counteract the exponential effect from Benford's Law and give a much more even distribution.

lionel-rowe avatar Mar 28 '22 03:03 lionel-rowe

Test suite is failing on the MSSQL 2017 step, presumably for a reason unrelated to the PR as my code changes purely affect formatting and don't touch anything database-related.

lionel-rowe avatar Mar 28 '22 20:03 lionel-rowe

Just to clarify (x is a value of $row['post_thanks'] / ($max_post_thanks); y is resulting post rating in % ) .

So post with 1/6 thanks count of mostly thanked one will get rating of ~66%, post with 1/3 count of mostly thanked one will get rating of ~90%. I'm not sure if it is correct approach from the posts' evaluation point of view.

Current rating distribution (y(x)=100*x) New rating distribution (y(x)=(1−2^(-10x))*100)

yotx ru (1)

yotx ru

rxu avatar Mar 29 '22 03:03 rxu

@rxu

So post with 1/6 thanks count of mostly thanked one will get rating of ~66%, post with 1/3 count of mostly thanked one will get rating of ~90%.

Yes, that's correct. The reasoning is that top-rated posts already tend to follow an exponential distribution: for example, in a forum where "100%" (the top rated post) has 100 thanks, getting the number of thanks for a random sample of 10 posts will typically look something like "2, 0, 0, 1, 6, 11, 2, 5, 4, 0" rather than "94, 27, 38, 90, 73, 6, 18, 46, 62, 13".

As a result, the current ratings tend to be almost universally very low, making it look like the vast majority of posts are of "low quality".

Easing the results exponentially counteracts this, approximating roughly the distribution you'd expect if ratings were percent_iles_ rather than percent_ages_ of the top post (using the actual percentile would likely overcomplicate things, as doing so in a performant way would probably require lots of caching).

I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow, or maybe even revisiting the percentile idea if I can think of a way to simplify it — thoughts?

lionel-rowe avatar Mar 29 '22 14:03 lionel-rowe

I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow

Makes sense, I guess having some switch or an option to select a distribution strategy (probably even more than amongst just 2) would be great.

rxu avatar Mar 30 '22 02:03 rxu