lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

Controversial posts and comments

Open JediMaster25 opened this issue 2 years ago • 18 comments

Posts and comments ordered by most total votes but that are close to zero score. I guess this should only be available on instances with the downvotes active.

JediMaster25 avatar Oct 26 '22 07:10 JediMaster25

I don't have time to do this but someone else could.

dessalines avatar Oct 27 '22 20:10 dessalines

Can I give this a try?

iByteABit256 avatar Jun 16 '23 16:06 iByteABit256

@iByteABit256 I was just digging into this. 🫠 I came back to propose a calculation for "controversialness". You can have this one. I'll share where I was at, anyway.

I was thinking something like this (but implemented in SQL, similar to the existing hot_rank SQL function).

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
    (upvotes + downvotes) / if score == 0 { 1 } else { score.unsigned_abs() }
}

Some examples of how this would work with various inputs can be seen here.

dcormier avatar Jun 16 '23 17:06 dcormier

Not bad, but it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".

For example 98/100 ratio isn't that different than 99/100, but it would have half the score.

My thinking was something like this:

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
  if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
}

Which seems more intuitive to me and gives more balanced scores, what do you think?

iByteABit256 avatar Jun 16 '23 18:06 iByteABit256

it has a flaw that small changes in like/dislike ratio can lead to huge changes in "controversialness".

Does it matter? Will that value be shown, or used for anything other than sorting the comments?

My thinking was something like this:

fn controversy_rank(upvotes: u32, downvotes: u32, score: i32) -> u32 {
  if downvotes != 0 { upvotes / downvotes * score.unsigned_abs() } else { 0 }
}

The results for that are surprising. 100 upvotes and 100 downvotes results in 0 controversialness. The same as if something has 0 upvotes and 100 downvotes. Similarly, I would expect these to have the same level of controverialness, but they don't:

    assert_eq!(5, controversy_rank(50, 45, 5));
    assert_eq!(0, controversy_rank(45, 50, -5));

dcormier avatar Jun 16 '23 18:06 dcormier

You're right, it needs some work. Also, what I was thinking for the multiplier was actually (upvotes + downvotes) to represent activity, since a 50-50 post with 2 total votes is much less controversial than a 50-50 post with 1000 votes.

Your way definitely gives good enough results though, I just want to explore it a bit before implementing it

iByteABit256 avatar Jun 16 '23 19:06 iByteABit256

Yeah, that's what I was thinking, too. The total number of votes should be significant, here.

It's definitely worth exploring.

Here's something to show the output better and let you fiddle with the math more. I originally was just using a spreadsheet to try different approaches.

dcormier avatar Jun 16 '23 19:06 dcormier

Printing it out as a table made it quite clearer, I think it's good enough to keep

All of the high scores are highly controversial, and the amount of activity clearly scales with it

iByteABit256 avatar Jun 16 '23 19:06 iByteABit256

I agree. It seems good. I'd like to see some more people chime in with opinions, but maybe that'll come with a PR. At the very least, it's something that can be moved forward with.


Edit: Playing with the output visualization more because I was bored and it was pleasing. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=130155f2c33aa262c403427b8235dd82

dcormier avatar Jun 16 '23 20:06 dcormier

That's proof enough haha, really cool!

iByteABit256 avatar Jun 16 '23 21:06 iByteABit256

Looking that that output makes me think it might be worthwhile to subsort by something like activity (descending), or some other existing sort type. Things that aren't controversial get to a pretty flat curve fairly quickly, and otherwise may result in inconsistent ordering (if that's important in any way).

dcormier avatar Jun 16 '23 21:06 dcormier

It might also be helpful if the "controversy score" were visible in the UI when sorting this way too.

jamesmcm avatar Jun 16 '23 21:06 jamesmcm

To throw in another idea:

min(upvotes, downvotes)

However, its primary advantage is that it is simpler, so easier to understand for user.

qznc avatar Jun 17 '23 07:06 qznc

That seems like it has worked well in the past.

cpdef double controversy(long ups, long downs):
    """The controversy sort."""
    if downs <= 0 or ups <= 0:
        return 0

    magnitude = ups + downs
    balance = float(downs) / ups if ups > downs else float(ups) / downs

    return magnitude ** balance

ghost avatar Jun 17 '23 07:06 ghost

Here is a comparison between @dcormier's, @qznc's and Reddit's method.

Reddit's looks like the most correct overall, but @dcormier's looks almost as good but much more performant since it doesn't involve float arithmetic and powers. @qznc's is the most performant, but the results are quite worse judging from this


Edit: Added an alteration of my own method after realising the main problem with it and how Reddit solved it

Edit: Changed debug build to release build and did absolute function manually instead of using Rust's abs() which seemed to be much faster. The results of the first 3 all seem good enough, time seems to be slightly better on the ratio method but take that with a grain of salt. After all, this is going to be implemented in SQL not Rust.

iByteABit256 avatar Jun 17 '23 16:06 iByteABit256

That's not a very effective way of benchmarking in this case, unfortunately. The results are wildly different from run to run, and even within the same run. I.e., not only do the number change quite a bit from run to run, but within the same run two algorithms that had similar times in one run might have disparate times in another. Using cargo bench (requires nightly) or Criterion.rs would show differences more clearly.

Regardless, it's probably not worth benchmarking that in Rust. The existing hot_rank function used to produce the value to sort by when sorting on hot lives in SQL, not Rust. I would expect this function to end up being similar.

The Reddit method produces more gentle curve, which is nice.

dcormier avatar Jun 19 '23 14:06 dcormier

I had a pretty lucky streak when I first wrote it but yeah, unfortunately it seems completely indeterminate now that I tried it again some times

iByteABit256 avatar Jun 19 '23 17:06 iByteABit256