dbt_github icon indicating copy to clipboard operation
dbt_github copied to clipboard

[Feature] Label History Model

Open jon-openspace opened this issue 1 year ago • 6 comments

Is there an existing feature request for this?

  • [X] I have searched the existing issues

Describe the Feature

We use GitHub Labels to mark important milestones, and folks want to be able to quickly look up:

  1. When labels were applied to PR's
  2. How long PR's spent with a Label before the next was applied

Describe alternatives you've considered

I've built out this type of model for Jira Issue Time in Status before, so my current plan is to just re-do that with PR/Issue Labels.

Are you interested in contributing this feature?

  • [ ] Yes.
  • [X] Yes, but I will need assistance and will schedule time during your office hours for guidance.
  • [ ] No.

Anything else?

No response

jon-openspace avatar Mar 30 '23 19:03 jon-openspace

Hey @jon-openspace thanks for reaching out about this feature request!

I definitely see the value in having historical records of when labels were applied. The Github connector does not support history mode, however I'm assuming you plan to ISSUE_LABEL table to understand the times in which these labels were added.

It seems that you're open to contributing to this feature which is great news! Would you be able to schedule some time in our office hours so we can discuss specifics of how that will be represented in our package?

elanfivetran avatar Mar 31 '23 00:03 elanfivetran

Morning! No need for history mode - the Github Connector provides an ISSUE_LABEL_HISTORY table. image I built a dbt model that gives me what I need yesterday, and I'll schedule some office hours time so you can decide if you want it in this package.

jon-openspace avatar Mar 31 '23 15:03 jon-openspace

Here's how I do the staging model (I believe this would go into the dbt_github_source package)

with source as (

    select * from {{ source('github', 'issue_label_history') }}

),

renamed as (

    select
        issue_id,
        label_id,
        updated_at,
        labeled,
        actor_id,
        _fivetran_synced

    from source

)

select * from renamed


jon-openspace avatar Apr 04 '23 19:04 jon-openspace

Here's how I do the "Time in Label" Model. This gets joined to your existing github__pull_requests model. I believe this would only work with Snowflake, so you would need to make it Warehouse Agnostic.


with

    label_history as (
        select * from {{ ref("stg_github_internal__issue_label_history") }}
    ),

    labels as (select * from {{ ref("stg_github__label") }}),

    named_label_history as (
        select
            label_history.issue_id,
            label_history.updated_at                                                        as label_changed_ts,
            case when labeled = 'True' then 'Added ' else 'Removed ' end
            || labels.label                                                                 as this_label_change,
            dense_rank() over (
                partition by label_history.issue_id
                order by label_history.updated_at asc
            )                                                                               as label_change_number
        from label_history
        join labels on label_history.label_id = labels.label_id
    ),

    transitions as (
        select
            base.*,
            next.this_label_change                                                          as next_label_change,
            datediff(second, base.label_changed_ts, next.label_changed_ts)
            / 60.0                                                                          as time_to_next_label_minutes
        from named_label_history base
        left join
            named_label_history next
            on (
                base.issue_id = next.issue_id
                and base.label_change_number = next.label_change_number - 1
            )
    )

select *
from transitions

jon-openspace avatar Apr 04 '23 19:04 jon-openspace

Finally, I have an Intermediate Model to Pivot the above model.


with
    github_label_history as (select * from {{ ref("github_label_history") }}),

    pivoted_values as (
        select
            issue_id,
            {{
                dbt_utils.pivot(
                    "this_label_change",
                    dbt_utils.get_column_values(
                        ref("github_label_history"), "this_label_change"
                    ),
                    agg="MIN",
                    then_value="label_changed_ts",
                    prefix="First ",
                    suffix=" Label",
                    else_value="NULL",
                )
            }}
        from {{ ref("github_label_history") }}
        group by 1
    )

select *
from pivoted_values

jon-openspace avatar Apr 04 '23 19:04 jon-openspace

Thanks @jon-openspace for following up with this after our call!

We'll be taking a look and hopefully bring it in in an upcoming sprint!

fivetran-reneeli avatar Apr 05 '23 16:04 fivetran-reneeli