oso icon indicating copy to clipboard operation
oso copied to clipboard

Key metrics model in sqlmesh

Open ryscheng opened this issue 1 year ago • 3 comments

What is it?

Currently we have these legacy code_metrics_by_project, and onchain_metrics_by_project computed in dbt.

With the new metrics factory on sqlmesh, we should create new models. Here's my proposal: We just create 3 marts:

  • key_metrics_by_artifact
  • key_metrics_by_project
  • key_metrics_by_collection

The schema for each of these should look like this:

  • metric_id
  • artifact_id, project_id, or collection_id
  • amount
  • unit

This is not a time series metric. Instead, each metric_id has a custom way to aggregate all data. For example for commits it might be a sum. For first_commit, it might be just grabbing the earliest date.

This can power the new MetricPage https://github.com/opensource-observer/oso/issues/27 as well as the table in the CollectionPage

After this exists, we can deprecate the older models on dbt.

ryscheng avatar Sep 28 '24 01:09 ryscheng

Yes! I call these "point-in-time" metrics here #2275.

We should also consider adding a sample_date field that represents the last date in the relevant event data.

For example, from this query we can see a lot of issues with the time intervals currently used in our models:

select
    m.metric_name,
    max(t.amount),
    min(t.sample_date) as first_date,
    max(t.sample_date) as last_date
from metrics.timeseries_metrics_by_project_v0 as t
join metrics.metrics_v0 as m
    on t.metric_id = m.metric_id
join default.projects_v1 as p
    on t.project_id = p.project_id
where p.project_name = 'uniswap'  
group by 1
order by 4

image

ccerv1 avatar Sep 30 '24 12:09 ccerv1

I think the reason I'm suggesting a new model is because we might want to pre-compute the aggregate (total number of contributors, total number of commits, total number of stars) etc. This is different from "point-in-time" metrics, because we want to aggregate over all time.

But yes, I agree we should store the date that we compute the aggregate on.

ryscheng avatar Sep 30 '24 16:09 ryscheng

Summary from the conversation today.

Plan is to implement this as a sqlmesh model manually outside of the metrics factory. It should be just a variation of related models from timeseries metrics.

In the future we can file issues for automatically generating this from the metrics factory, but probably not worth it yet.

ryscheng avatar Oct 02 '24 23:10 ryscheng