dbt_github
dbt_github copied to clipboard
[Feature] Label History Model
Is there an existing feature request for this?
- [X] I have searched the existing issues
Describe the Feature
We use GitHub Labels to mark important milestones, and folks want to be able to quickly look up:
- When labels were applied to PR's
- How long PR's spent with a Label before the next was applied
Describe alternatives you've considered
I've built out this type of model for Jira Issue Time in Status before, so my current plan is to just re-do that with PR/Issue Labels.
Are you interested in contributing this feature?
- [ ] Yes.
- [X] Yes, but I will need assistance and will schedule time during your office hours for guidance.
- [ ] No.
Anything else?
No response
Hey @jon-openspace thanks for reaching out about this feature request!
I definitely see the value in having historical records of when labels were applied. The Github connector does not support history mode, however I'm assuming you plan to ISSUE_LABEL
table to understand the times in which these labels were added.
It seems that you're open to contributing to this feature which is great news! Would you be able to schedule some time in our office hours so we can discuss specifics of how that will be represented in our package?
Morning! No need for history mode - the Github Connector provides an ISSUE_LABEL_HISTORY
table.
I built a dbt model that gives me what I need yesterday, and I'll schedule some office hours time so you can decide if you want it in this package.
Here's how I do the staging model (I believe this would go into the dbt_github_source package)
with source as (
select * from {{ source('github', 'issue_label_history') }}
),
renamed as (
select
issue_id,
label_id,
updated_at,
labeled,
actor_id,
_fivetran_synced
from source
)
select * from renamed
Here's how I do the "Time in Label" Model. This gets joined to your existing github__pull_requests model. I believe this would only work with Snowflake, so you would need to make it Warehouse Agnostic.
with
label_history as (
select * from {{ ref("stg_github_internal__issue_label_history") }}
),
labels as (select * from {{ ref("stg_github__label") }}),
named_label_history as (
select
label_history.issue_id,
label_history.updated_at as label_changed_ts,
case when labeled = 'True' then 'Added ' else 'Removed ' end
|| labels.label as this_label_change,
dense_rank() over (
partition by label_history.issue_id
order by label_history.updated_at asc
) as label_change_number
from label_history
join labels on label_history.label_id = labels.label_id
),
transitions as (
select
base.*,
next.this_label_change as next_label_change,
datediff(second, base.label_changed_ts, next.label_changed_ts)
/ 60.0 as time_to_next_label_minutes
from named_label_history base
left join
named_label_history next
on (
base.issue_id = next.issue_id
and base.label_change_number = next.label_change_number - 1
)
)
select *
from transitions
Finally, I have an Intermediate Model to Pivot the above model.
with
github_label_history as (select * from {{ ref("github_label_history") }}),
pivoted_values as (
select
issue_id,
{{
dbt_utils.pivot(
"this_label_change",
dbt_utils.get_column_values(
ref("github_label_history"), "this_label_change"
),
agg="MIN",
then_value="label_changed_ts",
prefix="First ",
suffix=" Label",
else_value="NULL",
)
}}
from {{ ref("github_label_history") }}
group by 1
)
select *
from pivoted_values
Thanks @jon-openspace for following up with this after our call!
We'll be taking a look and hopefully bring it in in an upcoming sprint!