pontoon icon indicating copy to clipboard operation
pontoon copied to clipboard

Implement data collection for the approval ratio graphs

Open flodolo opened this issue 3 years ago • 6 comments

Reference: specs

We need to collect data about user contributions: approval ratio, self-approval ratio. See specs for more details.

flodolo avatar Apr 20 '22 09:04 flodolo

I'm a bit concerned that these ratio graphs would end up hiding valuable absolute-value data. It should be relatively straightforward to validate from actual data assumptions about e.g. the effect of variance across months in the number of suggestion submissions in the resulting ratios.

As a counter-proposal, my gut feeling is that a single stacked bar graph showing approved/self-approved/rejected/unreviewed strings for each month would be more informative.

eemeli avatar Apr 20 '22 12:04 eemeli

@flodolo What are your thoughts about the alternative chart?

mathjazz avatar Aug 10 '22 11:08 mathjazz

This issue is about data collection, not graphs (that would be #2487). Is there anything here that would change based on the graph we end up using?

flodolo avatar Aug 10 '22 11:08 flodolo

Yeah, we'd need to store different (absolute numbers instead of ratios) and additional (unreviewed) data.

mathjazz avatar Aug 10 '22 11:08 mathjazz

I would prefer to stick with the proposal in the specs.

The original goal for this part was to quickly get a sense of the quality of contributions (is someone else looking at these translations for a manager? How many are rejected for a new contributor?) and how this changes over time.

With a stacked graph, it would be impossible to see how these values evolve (e.g. approval-ratio if I want to promote someone to translator).

To get a sense of the size of the contribution, there's already the Contribution graph part. The number of unreviewed belongs to that.

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

flodolo avatar Aug 10 '22 12:08 flodolo

In terms of data. I guess nothing prevents us from storing absolute data + ratio (or calculate the ratio on the fly) if we want?

Correct.

mathjazz avatar Aug 10 '22 12:08 mathjazz

It turns out we can collect the relevant data on each page load fast enough.

Ne need for cron jobs and storing data in the database.

I'll paste the script I used locally and use it in #2486.

I'll close the issue.

import datetime

from dateutil.relativedelta import relativedelta

from django.contrib.auth.models import User
from django.db.models import Count, F, Q
from django.db.models.functions import TruncMonth
from django.utils import timezone

from pontoon.actionlog.models import ActionLog
from pontoon.base.utils import convert_to_unix_time

today = timezone.now().date()

dates = sorted(
    [
        convert_to_unix_time(
            datetime.date(today.year, today.month, 1) - relativedelta(months=n)
        )
        for n in range(25)
    ]
)

def extract_data(qs):
    values = [0] * 25
    for item in (
        qs.annotate(created_month=TruncMonth("created_at"))
        .values("created_month")
        .annotate(count=Count("id"))
        .values("created_month", "count")
    ):
        date = convert_to_unix_time(item["created_month"])
        index = dates.index(date)
        values[index] = item["count"]
    return values

u = User.objects.get(email="[email protected]")
actions = ActionLog.objects.filter(
    created_at__gte=timezone.now() - relativedelta(years=2),
    translation__user=u,
)

peer_actions = actions.exclude(performed_by=u)
peer_approvals = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
)
peer_rejections = extract_data(
    peer_actions.filter(action_type=ActionLog.ActionType.TRANSLATION_REJECTED)
)

self_actions = actions.filter(performed_by=u)
self_approvals = extract_data(
    self_actions.filter(
        # self-approved after submitting suggestions
        Q(action_type=ActionLog.ActionType.TRANSLATION_APPROVED)
        # submitted directly as translations
        | Q(
            action_type=ActionLog.ActionType.TRANSLATION_CREATED,
            translation__date=F("translation__approved_date"),
        )
    )
)

approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, peer_rejections)
]

self_approval_ratio = [
    0 if sum(pair) == 0 else (pair[0] / sum(pair))
    for pair in zip(peer_approvals, self_approvals)
]

mathjazz avatar Aug 18 '22 00:08 mathjazz