Key metrics model in sqlmesh
What is it?
Currently we have these legacy code_metrics_by_project, and onchain_metrics_by_project computed in dbt.
With the new metrics factory on sqlmesh, we should create new models. Here's my proposal: We just create 3 marts:
-
key_metrics_by_artifact -
key_metrics_by_project -
key_metrics_by_collection
The schema for each of these should look like this:
-
metric_id -
artifact_id,project_id, orcollection_id -
amount -
unit
This is not a time series metric. Instead, each metric_id has a custom way to aggregate all data. For example for commits it might be a sum. For first_commit, it might be just grabbing the earliest date.
This can power the new MetricPage https://github.com/opensource-observer/oso/issues/27 as well as the table in the CollectionPage
After this exists, we can deprecate the older models on dbt.
Yes! I call these "point-in-time" metrics here #2275.
We should also consider adding a sample_date field that represents the last date in the relevant event data.
For example, from this query we can see a lot of issues with the time intervals currently used in our models:
select
m.metric_name,
max(t.amount),
min(t.sample_date) as first_date,
max(t.sample_date) as last_date
from metrics.timeseries_metrics_by_project_v0 as t
join metrics.metrics_v0 as m
on t.metric_id = m.metric_id
join default.projects_v1 as p
on t.project_id = p.project_id
where p.project_name = 'uniswap'
group by 1
order by 4
I think the reason I'm suggesting a new model is because we might want to pre-compute the aggregate (total number of contributors, total number of commits, total number of stars) etc. This is different from "point-in-time" metrics, because we want to aggregate over all time.
But yes, I agree we should store the date that we compute the aggregate on.
Summary from the conversation today.
Plan is to implement this as a sqlmesh model manually outside of the metrics factory. It should be just a variation of related models from timeseries metrics.
In the future we can file issues for automatically generating this from the metrics factory, but probably not worth it yet.