pontoon Implement data collection for the approval ratio graphs

Reference: specs

We need to collect data about user contributions: approval ratio, self-approval ratio. See specs for more details.

Apr 20 '22 09:04 flodolo

I'm a bit concerned that these ratio graphs would end up hiding valuable absolute-value data. It should be relatively straightforward to validate from actual data assumptions about e.g. the effect of variance across months in the number of suggestion submissions in the resulting ratios.

As a counter-proposal, my gut feeling is that a single stacked bar graph showing approved/self-approved/rejected/unreviewed strings for each month would be more informative.

Apr 20 '22 12:04 eemeli

@flodolo What are your thoughts about the alternative chart?

Aug 10 '22 11:08 mathjazz

This issue is about data collection, not graphs (that would be #2487). Is there anything here that would change based on the graph we end up using?

Aug 10 '22 11:08 flodolo

Yeah, we'd need to store different (absolute numbers instead of ratios) and additional (unreviewed) data.

Aug 10 '22 11:08 mathjazz

I would prefer to stick with the proposal in the specs.

The original goal for this part was to quickly get a sense of the quality of contributions (is someone else looking at these translations for a manager? How many are rejected for a new contributor?) and how this changes over time.

With a stacked graph, it would be impossible to see how these values evolve (e.g. approval-ratio if I want to promote someone to translator).

To get a sense of the size of the contribution, there's already the Contribution graph part. The number of unreviewed belongs to that.

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

Aug 10 '22 12:08 flodolo

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

Correct.

Aug 10 '22 12:08 mathjazz

It turns out we can collect the relevant data on each page load fast enough.

Ne need for cron jobs and storing data in the database.

I'll paste the script I used locally and use it in #2486.

I'll close the issue.

import datetime

from dateutil.relativedelta import relativedelta

from django.contrib.auth.models import User
from django.db.models import Count, F, Q
from django.db.models.functions import TruncMonth
from django.utils import timezone

from pontoon.actionlog.models import ActionLog
from pontoon.base.utils import convert_to_unix_time

today = timezone.now().date()

dates = sorted(
    [
        convert_to_unix_time(
            datetime.date(today.year, today.month, 1) - relativedelta(months=n)
        )
        for n in range(25)
    ]
)

def extract_data(qs):
    values = [0] * 25
    for item in (
        qs.annotate(created_month=TruncMonth("created_at"))
        .values("created_month")
        .annotate(count=Count("id"))
        .values("created_month", "count")
    ):
        date = convert_to_unix_time(item["created_month"])
        index = dates.index(date)
        values[index] = item["count"]
    return values

u = User.objects.get(email="[email protected]")
actions = ActionLog.objects.filter(
    created_at__gte=timezone.now() - relativedelta(years=2),
    translation__user=u,
)

peer_actions = actions.exclude(performed_by=u)
peer_approvals = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
)
peer_rejections = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_REJECTED)
)

self_actions = actions.filter(performed_by=u)
self_approvals = extract_data(
    self_actions.filter(
        # self-approved after submitting suggestions
        Q(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
        # submitted directly as translations
        | Q(
            action_type=ActionLog.ActionType.TRANSLATION_CREATED,
            translation__date=F("translation__approved_date"),
        )
    )
)

approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, peer_rejections)
]

self_approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, self_approvals)
]

Aug 18 '22 00:08 mathjazz