metricflow
metricflow copied to clipboard
[Bug] Cumulative metrics don't work with time dimensions other than metric time.
Is this a new bug in metricflow?
- [X] I believe this is a new bug in metricflow
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Running a query referencing the agg_time_dimension
by name makes cumulative metrics ignore the times pine model. For example mf query --metrics cumulative_revenue --group-by order_item__ordered_at,order_id__customer --explain
returns the following SQL:
SELECT subq_3.order_item__ordered_at__day AS order_item__ordered_at__day , orders_src_3.customer_id AS order_id__customer , SUM(subq_3.revenue) AS cumulative_revenue FROM ( SELECT cast(ordered_at as DATETIME) AS order_item__ordered_at__day , order_id , product_price AS revenue FROM ANALYTICS.js_dbt_sl_demo.order_items order_item_src_2 ) subq_3 LEFT OUTER JOIN ANALYTICS.js_dbt_sl_demo.orders orders_src_3 ON subq_3.order_id = orders_src_3.order_id GROUP BY subq_3.order_item__ordered_at__
day , orders_src_3.customer_id
Note that there is no time spine join in the SQL. order_item__ordered_at
is the agg time dimension for this measure so I would expect to be able to call it. If I replace order_item__ordered_at
with metric_time
the cumulative metric works as expected.
SELECT subq_5.metric_time__day AS metric_time__day , orders_src_3.customer_id AS order_id__customer , SUM(subq_5.revenue) AS cumulative_revenue FROM ( SELECT subq_4.date_day AS metric_time__day , subq_2.order_id AS order_id , subq_2.revenue AS revenue FROM js_dbt_sl_demo.metricflow_time_spine subq_4 INNER JOIN ( SELECT cast(ordered_at as DATETIME) AS metric_time__day , order_id , product_price AS revenue FROM ANALYTICS.js_dbt_sl_demo.order_items order_item_src_2 ) subq_2 ON ( subq_2.metric_time__day <= subq_4.date_day ) AND ( subq_2.metric_time__day > DATEADD(day, -7, subq_4.date_day) ) ) subq_5 LEFT OUTER JOIN ANALYTICS.js_dbt_sl_demo.orders orders_src_3 ON subq_5.order_id = orders_src_3.order_id GROUP BY subq_5.metric_time__day , orders_src_3.customer_id ORDER BY metric_time__day DESC
Expected Behavior
I expect to be able to call agg_time_dimension
by name or metric_time
and have the query run. I also expect to be able to reference another time dimension that is not the agg_time_dimension
and have the cumulative metric work properly. For example, if I had ordered_at
which is the agg_time_dimension
for cumulative revenue, and ordered_at_local
which shows the local time I would expect to be able to reference either of these time dimensions and see the cumulative revenue at that date.
Steps To Reproduce
In the jaffle-sl-template project run mf query --metrics cumulative_revenue --group-by order_item__ordered_at,order_id__customer --explain
. The generated SQL will not join the timespine model.
Relevant log output
No response
Environment
- OS:
- Python:
- dbt:
- metricflow:
Which database are you using?
No response
Additional Context
No response
Any updates?
We're still working to get this bug prioritized. I'll flag this to you once we start working on it!
So with dbt 1.7 MetricFlow does not have any additional functionality, but you have made it throw an error instead of producing incorrect results? What's the estimate on improved functionality?
Just to emphasise the importance of this topic. I am currently looking at having to duplicate ALL of the semantic models covering facts (dimension models seem to be ok) which also means duplication of ALL the metrics defined on top of these fact models. All of this just to get the flexibility to choose between two different date columns. Duplicating models and metrics means they will go out of sync and we will lose the single source of truth benefit.
Any updates?
@siljamardla we're planning to split this into three different tasks:
- You use the column name of the agg time dimension
- You use a different time dimension dynamically for the cumulative metric
- You don't use a time dimension at all
We're planning to work on #1 first, which should be in the next couple of weeks. #2 and #3 still need to be prioritized, so no update there.
Questions for you since it's been a bit since we chatted. Could you create another measure using ordered_at_local
as the agg time dimension as a workaround? This would introduce some duplication, but you shouldn't need to duplicate semantic models.
We're planning to work on https://github.com/dbt-labs/metricflow/pull/1 first, which should be in the next couple of weeks. https://github.com/dbt-labs/metricflow/pull/2 and https://github.com/dbt-labs/metricflow/pull/3 still need to be prioritized, so no update there.
Should there be links behind the numbers, i.e. are there issues already prepared so we could follow them? The current links take us to PR #1, #2, #3 which are unrelated.
Could you create another measure using ordered_at_local as the agg time dimension as a workaround?
Interesting. How would I specify the agg time dimension on a measure?
measures:
- name: rides_orders_local
expr: 1
agg: sum
agg_time_dimension: ordered_at_local
? Never realised this is an option, will try.
Tried adding an agg_time_dimension
to a measure, it works. So now I can get rid of duplicated semantic models, but duplicated measures and metrics will remain.
Breaking out the first of the three tasks @Jstein77 mentioned into its own issue here: https://github.com/dbt-labs/metricflow/issues/1000