lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

The rank of a post in the aggregated feed should be inversely proportional to the size of the community

Open half-adder opened this issue 4 years ago • 6 comments

Is your proposal related to a problem?

It's hard to see posts from smaller communities when you are also subscribed to larger communities.

Describe the solution you'd like

The weight of a post in the aggregated feed should be inversely proportional to the size of the community. This will allow posts from smaller communities (which get fewer upvotes) to float higher in the aggregated main feed, and be interspersed with posts from larger communities (which get many upvotes).

Consider a user that is subscribed to 3 communities:

C_0: 3 subscribers
C_1: 100 subscribers
C_2: 1000 subscribers

Then, an additional term could be added to the weight of the posts from each respective community:

C_0: weight * s * (1/3)
C_1: weight * s * (1/100)
C_2: weight * s * (1/1000)

weight = weight as it exists today s = "scale factor" (i.e. how much the size of the community negatively affects the weight)

Describe alternatives you've considered

Instead of community size, maybe other indicators could be used. Off the top of my head, perhaps the average number of upvotes in the community (or a rolling average, of say the last week).

Additional context

That's it. Thanks for all of your work, Lemmy is really cool! I would make a PR but I've never used Rust...


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

half-adder avatar Jul 26 '20 03:07 half-adder

OK I thought about this a little more.

Here's my first pass at a function that does this mapping.

z = rank / (1 + (scale_factor * community_size))

where rank is the ranking as described in https://dev.lemmy.ml/docs/about_ranking.html (note, rank should be in the range [0,1]. I assume this is the range you get when you divide the rank described by 10k?)

community_size needs also to be in the range [0, 1]. I think the most sensible way to achieve this is to normalize the community size relative to each user. So, 0 is mapped to the # of subscribers that the user's least-subscribed community has, and 1 is mapped to the # of subscribers thaat the user's most-subscribed community has, using the process described here.

And here is what that mapping looks like for various scale_factor (the colors are quantized here to be able to more clearly see the contours): lemmyfig

As you can see, for larger communities, it takes a larger number of votes to get the equivalent z as a smaller community. The degree to which this applies is controlled by the scale_factor. So, I think this function achieves the desired result.

half-adder avatar Jul 26 '20 07:07 half-adder

Hi ! This sounds like a good idea. With the current ranking, very big communities seems to be over-represented on the homepage. Sometimes a couple communities account for 80% of the homepage.
All things being equal, it does make sense for posts with lots of upvotes/comments to get a higher rank. But it would be good to display a diversity of communities on Lemmy's homepage.

@half-adder you suggest possible scaling, the graph seems like a good start. However I don't understand the need for normalization. Is this required by Lemmy's design? When looking at Lemmy's documentation on ranking I see values as high as 600.

Normalization would probably be difficult in a fediverse settings. You'd either need to

  • Set a an arbitrary maximum at which score is capped. Arbitrary things are usually bad, and in this case posts above this max would have identical score.
  • Or, Look at all the fediverse's communities and sort them to find the largest one to get a maximum. That's a relatively complex operation to compute a post's score. And since communities size vary often, you'd need to recompute all posts score continuously, or accept that scores may be outdated (ie normalized using old maximum).

Scaling without normalizing would be saner IMHO, in order to obtain ranks that are absolutes, can be computed independently of other communities' size, and can be globally compared across the fediverse. Using either log() or inverse pow() functions.

For instance: z = rank / log(1 + community_size * factor) z = rank / (community_size^(1/factor))

guillaume-uH57J9 avatar Dec 26 '22 18:12 guillaume-uH57J9

The z = rank / log(1 + community_size * factor) would be appropriate, but it should also use the monthly or weekly active users (IE activity), rather than community size, which is mostly useless.

Since there would then be two scale factors (one that affects the timed rank, and one that affects the community activity), they would need to be tuned in such a way as to not swamp out the other affect.

The time influence should always be stronger.

That would make the final rank something like:

z = ScaleFactor1 * log(Max(1, 3 + Score)) / (Time + 2)^Gravity / log(1 + active_monthly_users * ScaleFactor2) or

z = log_score_factor / pow_time_decay / log_community_activity

dessalines avatar Dec 29 '22 16:12 dessalines

This way you would force all instances to be about all topics equally. I personally don't like to see so much Shit Reactionaries Say posts from Lemmygrad.ml on Lemmy.ml but while this would fix that, it creates a bigger problem than what it's fixing. There has to be a better way.

Block any communities you don't want to see, or use the Local or Subscribed filter to not see federated communities. This issue is completely separate from that.

What I would like is for users to be able to give communities a weight in the form of 0-100 points represented as 0 to 5 stars and get an amount of posts from each community in their feed proportional to the weight. Otherwise assign a weight automatically to each community based on each user interactions with the posts in the community as a percentage of upvotes vs downvotes.

This sounds incredibly complicated for users or admins to do, when all they want is to see posts from both smaller communities and larger ones, without having to explicitly add weighting values for each.

@guillaume-uH57J9 's solution is the best way to handle this.

dessalines avatar Feb 06 '23 18:02 dessalines

For now could we add this to the post select (https://github.com/LemmyNet/lemmy/blob/b214d3dc00c269d7987ace7f5522e2ff406eec03/crates/db_views/src/post_view.rs#LL288C1-L288C16)

ROW_NUMBER() OVER (PARTITION BY post.community_id ORDER BY post_aggregates.score DESC) AS community_rank

I tried with all my might to get this translated into diesel, but it seems rust has gotten the better of me.

Explaination: It assigns a rank number based on the score in it's community. We then create a sort for Best Day, Best Month etc etc (I can create the sorts)

@dessalines @Nutomic

L3v3L avatar Jun 21 '23 05:06 L3v3L

@half-adder you suggest possible scaling, the graph seems like a good start. However I don't understand the need for normalization. Is this required by Lemmy's design?

I don't think you need to normalize, but it made visual comparison of the scaling factors easier

half-adder avatar Jun 22 '23 01:06 half-adder

What about balancing instances based on monthly active users instead of communities?

Request for Comments: Balance Scores Based on Monthly Active Users

ghost avatar Jun 27 '23 12:06 ghost

Thank you! <3

Atemu avatar Sep 07 '23 07:09 Atemu