lemmy
lemmy copied to clipboard
Controversial posts and comments
Posts and comments ordered by most total votes but that are close to zero score. I guess this should only be available on instances with the downvotes active.
I don't have time to do this but someone else could.
Can I give this a try?
@iByteABit256 I was just digging into this. 🫠 I came back to propose a calculation for "controversialness". You can have this one. I'll share where I was at, anyway.
I was thinking something like this (but implemented in SQL, similar to the existing hot_rank
SQL function).
fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
(upvotes + downvotes) / if score == 0 { 1 } else { score.unsigned_abs() }
}
Some examples of how this would work with various inputs can be seen here.
Not bad, but it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".
For example 98/100 ratio isn't that different than 99/100, but it would have half the score.
My thinking was something like this:
fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
}
Which seems more intuitive to me and gives more balanced scores, what do you think?
it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".
Does it matter? Will that value be shown, or used for anything other than sorting the comments?
My thinking was something like this:
fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 { if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 } }
The results for that are surprising. 100 upvotes and 100 downvotes results in 0 controversialness. The same as if something has 0 upvotes and 100 downvotes. Similarly, I would expect these to have the same level of controverialness, but they don't:
assert_eq!(5, controversy_rank(50, 45, 5));
assert_eq!(0, controversy_rank(45, 50, -5));
You're right, it needs some work. Also, what I was thinking for the multiplier was actually (upvotes + downvotes)
to represent activity, since a 50-50 post with 2 total votes is much less controversial than a 50-50 post with 1000 votes.
Your way definitely gives good enough results though, I just want to explore it a bit before implementing it
Yeah, that's what I was thinking, too. The total number of votes should be significant, here.
It's definitely worth exploring.
Here's something to show the output better and let you fiddle with the math more. I originally was just using a spreadsheet to try different approaches.
Printing it out as a table made it quite clearer, I think it's good enough to keep
All of the high scores are highly controversial, and the amount of activity clearly scales with it
I agree. It seems good. I'd like to see some more people chime in with opinions, but maybe that'll come with a PR. At the very least, it's something that can be moved forward with.
Edit: Playing with the output visualization more because I was bored and it was pleasing. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=130155f2c33aa262c403427b8235dd82
That's proof enough haha, really cool!
Looking that that output makes me think it might be worthwhile to subsort by something like activity (descending), or some other existing sort type. Things that aren't controversial get to a pretty flat curve fairly quickly, and otherwise may result in inconsistent ordering (if that's important in any way).
It might also be helpful if the "controversy score" were visible in the UI when sorting this way too.
To throw in another idea:
min(upvotes, downvotes)
However, its primary advantage is that it is simpler, so easier to understand for user.
That seems like it has worked well in the past.
cpdef double controversy(long ups, long downs):
"""The controversy sort."""
if downs <= 0 or ups <= 0:
return 0
magnitude = ups + downs
balance = float(downs) / ups if ups > downs else float(ups) / downs
return magnitude ** balance
Here is a comparison between @dcormier's, @qznc's and Reddit's method.
Reddit's looks like the most correct overall, but @dcormier's looks almost as good but much more performant since it doesn't involve float arithmetic and powers. @qznc's is the most performant, but the results are quite worse judging from this
Edit: Added an alteration of my own method after realising the main problem with it and how Reddit solved it
Edit: Changed debug build to release build and did absolute function manually instead of using Rust's abs() which seemed to be much faster. The results of the first 3 all seem good enough, time seems to be slightly better on the ratio method but take that with a grain of salt. After all, this is going to be implemented in SQL not Rust.
That's not a very effective way of benchmarking in this case, unfortunately. The results are wildly different from run to run, and even within the same run. I.e., not only do the number change quite a bit from run to run, but within the same run two algorithms that had similar times in one run might have disparate times in another. Using cargo bench
(requires nightly) or Criterion.rs would show differences more clearly.
Regardless, it's probably not worth benchmarking that in Rust. The existing hot_rank
function used to produce the value to sort by when sorting on hot
lives in SQL, not Rust. I would expect this function to end up being similar.
The Reddit method produces more gentle curve, which is nice.
I had a pretty lucky streak when I first wrote it but yeah, unfortunately it seems completely indeterminate now that I tried it again some times