documentation icon indicating copy to clipboard operation
documentation copied to clipboard

Transparency of popularity scoring algorithm used in trending posts, hashtags, people, news

Open JohannesBuchner opened this issue 1 month ago • 0 comments

Given that mastodon tries to avoid algorithms that impose preferences onto users, it would be good to thoroughly document the algorithms used in the "explore" sections.

Currently the documentation has vague statements. On https://docs.joinmastodon.org/methods/trends/ :

  • "View hashtags that are currently being used more frequently than usual."
  • "Links that have been shared more than others."
  • "Tags that are being used more frequently within the past week." (this is also incorrect if I look at the code below) The links at the bottom lead to code with no comments.

What is the threshold? How is frequency determined? How is the parent pool of hashtags found?

After digging for an hour, I found these functions in the code:

  • score for account: https://github.com/mastodon/mastodon/blob/main/app/services/account_search_service.rb#L96
    • by number of followers (followers_score_function)
    • by "reputation" = followers_count / (followers_count + following_count) (reputation_score_function)
    • by deranking accounts that haven't posted in a long time (time_distance_function)
    • somehow these can be combined? Maybe in the admin interface?
  • status score https://github.com/mastodon/mastodon/blob/main/app/models/trends/statuses.rb#L112
      expected  = 1.0
      observed  = (status.reblogs_count + status.favourites_count).to_f

      score = if expected > observed || observed < options[:threshold]
                0
              else
                ((observed - expected)**2) / expected
              end

      decaying_score = if score.zero? || !eligible?(status)
                         0
                       else
                         score * (0.5**((at_time.to_f - status.created_at.to_f) / options[:score_halflife].to_f))
                       end
  • tag score https://github.com/mastodon/mastodon/blob/main/app/models/trends/tags.rb#L55
      expected  = tag.history.get(at_time - 1.day).accounts.to_f
      expected  = 1.0 if expected.zero?
      observed  = tag.history.get(at_time).accounts.to_f
      max_time  = tag.max_score_at
      max_score = tag.max_score
      max_score = 0 if max_time.nil? || max_time < (at_time - options[:max_score_cooldown])

      score = if expected > observed || observed < options[:threshold]
                0
              else
                ((observed - expected)**2) / expected
              end

      if score > max_score
        max_score = score
        max_time  = at_time

        # Not interested in triggering any callbacks for this
        tag.update_columns(max_score: max_score, max_score_at: max_time)
      end

      decaying_score = max_score * (0.5**((at_time.to_f - max_time.to_f) / options[:max_score_halflife].to_f))

      next unless decaying_score >= options[:decay_threshold]

      items << { score: decaying_score, item: tag }

From this, I can see that the admin can alter the behaviour with options.

I think it would be nice to be transparent about the algorithms used, both to users and to developers.

I suggest two improvements:

  1. on the "Explore" page, add a "more information" link to the "These are posts from across the social web that are gaining traction today. Newer posts with more boosts and favorites are ranked higher." popup, which leads to a documentation page that presents the algorithm configuration used in this instance. Same for each of Posts, Hashtags, People, News
  2. on that documentation page, give the algorithm used with the option values of the instance.

For the posts, this could look something like:

<details><summary>These are posts from across the social web that are gaining traction today. Newer posts with more boosts and favorites are ranked higher (click for details)</summary>
<p>
This instance uses the algorithm below with the options
 <ul>
     <li>threshold = 100
     <li>score_halflife = 1234s
  </ul>
The popularity score of an eligible post is computed with the number of reblogs and favourites, and the age in seconds of a post as:
<blockquote>
      expected  = 1.0
      observed  = reblogs_count + favourites_count
      if expected > observed or observed < threshold:
          score = 0
      else:
           score = ((observed - expected)**2) / expected * (0.5**(age / score_halflife))
</blockquote>
</details>
  • I used a HTML/CSS collapsible as shown in https://dev.to/jordanfinners/creating-a-collapsible-section-with-nothing-but-html-4ip9 so that no additional full page needs to be added.

JohannesBuchner avatar Jun 03 '24 05:06 JohannesBuchner