metricflow
metricflow copied to clipboard
[Bug] Adding a dimension in `--where` but not in `--group-by` of an MF query will output dysfunctional SQL
Is this a new bug in metricflow?
- [X] I believe this is a new bug in metricflow
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Here's a query to get my_metric
value, grouped by user name and filtered for country (a valid dimension):
mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name
This produces SQL that has the following WHERE clause
WHERE city_id__country_id__country_code='fr'
but it does not have a join to the dimension table dim_city
(nor to dim_country
, which is also needed). Therefore, the SQL cannot be run.
There is a workaround: by adding the same dimension in the --group-by
like this:
mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name,city_id__country_id__country_code
the output SQL will have the necessary joins and the WHERE
clause becomes valid.
Expected Behavior
MetricFlow should be able to recognise dimensions in --where
clauses and add the necessary joins.
At the moment the WHERE clause can be arbitrary and compiled of multiple conditions, even complex things with brackets and whatnot. I agree that this solution allows for the most flexibility and the option should remain. However, there should also be a more controlled way of adding where clauses that relies on the semantics that we have defined and makes sure the outcome is valid SQL.
Steps To Reproduce
I have not tested if the issue only happens due to this particular dimension needing two joins (to dim_city
and to dim_country
). I suspect not, because the output SQL did not even have a join to dim_city
either.
To reproduce, it should be enough to try a --where
clause with a dimension and without a --group-by
for the same dimension.
Relevant log output
No response
Environment
- OS:
- Python:
- dbt: 1.8
- metricflow:
Which database are you using?
other (mention it in "Additional Context")
Additional Context
spark/databricks