metricflow icon indicating copy to clipboard operation
metricflow copied to clipboard

[Bug] Cumulative metrics don't work with time dimensions other than metric time.

Open Jstein77 opened this issue 1 year ago • 9 comments

Is this a new bug in metricflow?

  • [X] I believe this is a new bug in metricflow
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Running a query referencing the agg_time_dimension by name makes cumulative metrics ignore the times pine model. For example mf query --metrics cumulative_revenue --group-by order_item__ordered_at,order_id__customer --explain returns the following SQL:

SELECT subq_3.order_item__ordered_at__day AS order_item__ordered_at__day , orders_src_3.customer_id AS order_id__customer , SUM(subq_3.revenue) AS cumulative_revenue FROM ( SELECT cast(ordered_at as DATETIME) AS order_item__ordered_at__day , order_id , product_price AS revenue FROM ANALYTICS.js_dbt_sl_demo.order_items order_item_src_2 ) subq_3 LEFT OUTER JOIN ANALYTICS.js_dbt_sl_demo.orders orders_src_3 ON subq_3.order_id = orders_src_3.order_id GROUP BY subq_3.order_item__ordered_at__

day , orders_src_3.customer_id

Note that there is no time spine join in the SQL. order_item__ordered_at is the agg time dimension for this measure so I would expect to be able to call it. If I replace order_item__ordered_at with metric_time the cumulative metric works as expected.

SELECT subq_5.metric_time__day AS metric_time__day , orders_src_3.customer_id AS order_id__customer , SUM(subq_5.revenue) AS cumulative_revenue FROM ( SELECT subq_4.date_day AS metric_time__day , subq_2.order_id AS order_id , subq_2.revenue AS revenue FROM js_dbt_sl_demo.metricflow_time_spine subq_4 INNER JOIN ( SELECT cast(ordered_at as DATETIME) AS metric_time__day , order_id , product_price AS revenue FROM ANALYTICS.js_dbt_sl_demo.order_items order_item_src_2 ) subq_2 ON ( subq_2.metric_time__day <= subq_4.date_day ) AND ( subq_2.metric_time__day > DATEADD(day, -7, subq_4.date_day) ) ) subq_5 LEFT OUTER JOIN ANALYTICS.js_dbt_sl_demo.orders orders_src_3 ON subq_5.order_id = orders_src_3.order_id GROUP BY subq_5.metric_time__day , orders_src_3.customer_id ORDER BY metric_time__day DESC

Expected Behavior

I expect to be able to call agg_time_dimension by name or metric_time and have the query run. I also expect to be able to reference another time dimension that is not the agg_time_dimension and have the cumulative metric work properly. For example, if I had ordered_at which is the agg_time_dimension for cumulative revenue, and ordered_at_local which shows the local time I would expect to be able to reference either of these time dimensions and see the cumulative revenue at that date.

Steps To Reproduce

In the jaffle-sl-template project run mf query --metrics cumulative_revenue --group-by order_item__ordered_at,order_id__customer --explain. The generated SQL will not join the timespine model.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:
- metricflow:

Which database are you using?

No response

Additional Context

No response

Jstein77 avatar Oct 12 '23 15:10 Jstein77

Any updates?

siljamardla avatar Oct 23 '23 20:10 siljamardla

We're still working to get this bug prioritized. I'll flag this to you once we start working on it!

Jstein77 avatar Oct 23 '23 20:10 Jstein77

So with dbt 1.7 MetricFlow does not have any additional functionality, but you have made it throw an error instead of producing incorrect results? What's the estimate on improved functionality?

siljamardla avatar Nov 14 '23 13:11 siljamardla

Just to emphasise the importance of this topic. I am currently looking at having to duplicate ALL of the semantic models covering facts (dimension models seem to be ok) which also means duplication of ALL the metrics defined on top of these fact models. All of this just to get the flexibility to choose between two different date columns. Duplicating models and metrics means they will go out of sync and we will lose the single source of truth benefit.

siljamardla avatar Nov 14 '23 13:11 siljamardla

Any updates?

siljamardla avatar Jan 03 '24 13:01 siljamardla

@siljamardla we're planning to split this into three different tasks:

  1. You use the column name of the agg time dimension
  2. You use a different time dimension dynamically for the cumulative metric
  3. You don't use a time dimension at all

We're planning to work on #1 first, which should be in the next couple of weeks. #2 and #3 still need to be prioritized, so no update there.

Questions for you since it's been a bit since we chatted. Could you create another measure using ordered_at_local as the agg time dimension as a workaround? This would introduce some duplication, but you shouldn't need to duplicate semantic models.

Jstein77 avatar Jan 03 '24 19:01 Jstein77

We're planning to work on https://github.com/dbt-labs/metricflow/pull/1 first, which should be in the next couple of weeks. https://github.com/dbt-labs/metricflow/pull/2 and https://github.com/dbt-labs/metricflow/pull/3 still need to be prioritized, so no update there.

Should there be links behind the numbers, i.e. are there issues already prepared so we could follow them? The current links take us to PR #1, #2, #3 which are unrelated.

Could you create another measure using ordered_at_local as the agg time dimension as a workaround?

Interesting. How would I specify the agg time dimension on a measure?

    measures:
      - name: rides_orders_local
        expr: 1
        agg: sum
        agg_time_dimension: ordered_at_local

? Never realised this is an option, will try.

siljamardla avatar Jan 17 '24 07:01 siljamardla

Tried adding an agg_time_dimension to a measure, it works. So now I can get rid of duplicated semantic models, but duplicated measures and metrics will remain.

siljamardla avatar Jan 17 '24 11:01 siljamardla

Breaking out the first of the three tasks @Jstein77 mentioned into its own issue here: https://github.com/dbt-labs/metricflow/issues/1000

courtneyholcomb avatar Jan 26 '24 06:01 courtneyholcomb