lemmy
lemmy copied to clipboard
The rank of a post in the aggregated feed should be inversely proportional to the size of the community
Is your proposal related to a problem?
It's hard to see posts from smaller communities when you are also subscribed to larger communities.
Describe the solution you'd like
The weight of a post in the aggregated feed should be inversely proportional to the size of the community. This will allow posts from smaller communities (which get fewer upvotes) to float higher in the aggregated main feed, and be interspersed with posts from larger communities (which get many upvotes).
Consider a user that is subscribed to 3 communities:
C_0: 3 subscribers
C_1: 100 subscribers
C_2: 1000 subscribers
Then, an additional term could be added to the weight of the posts from each respective community:
C_0: weight * s * (1/3)
C_1: weight * s * (1/100)
C_2: weight * s * (1/1000)
weight
= weight as it exists today
s
= "scale factor" (i.e. how much the size of the community negatively affects the weight)
Describe alternatives you've considered
Instead of community size, maybe other indicators could be used. Off the top of my head, perhaps the average number of upvotes in the community (or a rolling average, of say the last week).
Additional context
That's it. Thanks for all of your work, Lemmy is really cool! I would make a PR but I've never used Rust...
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
OK I thought about this a little more.
Here's my first pass at a function that does this mapping.
z = rank / (1 + (scale_factor * community_size))
where rank
is the ranking as described in https://dev.lemmy.ml/docs/about_ranking.html (note, rank
should be in the range [0,1]
. I assume this is the range you get when you divide the rank described by 10k?)
community_size
needs also to be in the range [0, 1]
. I think the most sensible way to achieve this is to normalize the community size relative to each user. So, 0 is mapped to the # of subscribers that the user's least-subscribed community has, and 1 is mapped to the # of subscribers thaat the user's most-subscribed community has, using the process described here.
And here is what that mapping looks like for various scale_factor
(the colors are quantized here to be able to more clearly see the contours):
As you can see, for larger communities, it takes a larger number of votes to get the equivalent z
as a smaller community. The degree to which this applies is controlled by the scale_factor
. So, I think this function achieves the desired result.
Hi ! This sounds like a good idea.
With the current ranking, very big communities seems to be over-represented on the homepage. Sometimes a couple communities account for 80% of the homepage.
All things being equal, it does make sense for posts with lots of upvotes/comments to get a higher rank. But it would be good to display a diversity of communities on Lemmy's homepage.
@half-adder you suggest possible scaling, the graph seems like a good start. However I don't understand the need for normalization. Is this required by Lemmy's design? When looking at Lemmy's documentation on ranking I see values as high as 600.
Normalization would probably be difficult in a fediverse settings. You'd either need to
- Set a an arbitrary maximum at which score is capped. Arbitrary things are usually bad, and in this case posts above this max would have identical score.
- Or, Look at all the fediverse's communities and sort them to find the largest one to get a maximum. That's a relatively complex operation to compute a post's score. And since communities size vary often, you'd need to recompute all posts score continuously, or accept that scores may be outdated (ie normalized using old maximum).
Scaling without normalizing would be saner IMHO, in order to obtain ranks that are absolutes, can be computed independently of other communities' size, and can be globally compared across the fediverse. Using either log() or inverse pow() functions.
For instance: z = rank / log(1 + community_size * factor) z = rank / (community_size^(1/factor))
The z = rank / log(1 + community_size * factor)
would be appropriate, but it should also use the monthly or weekly active users (IE activity), rather than community size, which is mostly useless.
Since there would then be two scale factors (one that affects the timed rank, and one that affects the community activity), they would need to be tuned in such a way as to not swamp out the other affect.
The time influence should always be stronger.
That would make the final rank something like:
z = ScaleFactor1 * log(Max(1, 3 + Score)) / (Time + 2)^Gravity / log(1 + active_monthly_users * ScaleFactor2)
or
z = log_score_factor / pow_time_decay / log_community_activity
This way you would force all instances to be about all topics equally. I personally don't like to see so much Shit Reactionaries Say posts from Lemmygrad.ml on Lemmy.ml but while this would fix that, it creates a bigger problem than what it's fixing. There has to be a better way.
Block any communities you don't want to see, or use the Local
or Subscribed
filter to not see federated communities. This issue is completely separate from that.
What I would like is for users to be able to give communities a weight in the form of 0-100 points represented as 0 to 5 stars and get an amount of posts from each community in their feed proportional to the weight. Otherwise assign a weight automatically to each community based on each user interactions with the posts in the community as a percentage of upvotes vs downvotes.
This sounds incredibly complicated for users or admins to do, when all they want is to see posts from both smaller communities and larger ones, without having to explicitly add weighting values for each.
@guillaume-uH57J9 's solution is the best way to handle this.
For now could we add this to the post select (https://github.com/LemmyNet/lemmy/blob/b214d3dc00c269d7987ace7f5522e2ff406eec03/crates/db_views/src/post_view.rs#LL288C1-L288C16)
ROW_NUMBER() OVER (PARTITION BY post.community_id ORDER BY post_aggregates.score DESC) AS community_rank
I tried with all my might to get this translated into diesel, but it seems rust has gotten the better of me.
Explaination: It assigns a rank number based on the score in it's community. We then create a sort for Best Day, Best Month etc etc (I can create the sorts)
@dessalines @Nutomic
@half-adder you suggest possible scaling, the graph seems like a good start. However I don't understand the need for normalization. Is this required by Lemmy's design?
I don't think you need to normalize, but it made visual comparison of the scaling factors easier
What about balancing instances based on monthly active users instead of communities?
Request for Comments: Balance Scores Based on Monthly Active Users
Thank you! <3