metricflow icon indicating copy to clipboard operation
metricflow copied to clipboard

[Bug] Adding a dimension in `--where` but not in `--group-by` of an MF query will output dysfunctional SQL

Open siljamardla opened this issue 5 months ago • 0 comments

Is this a new bug in metricflow?

  • [X] I believe this is a new bug in metricflow
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Here's a query to get my_metric value, grouped by user name and filtered for country (a valid dimension):

mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name

This produces SQL that has the following WHERE clause

WHERE city_id__country_id__country_code='fr'

but it does not have a join to the dimension table dim_city (nor to dim_country, which is also needed). Therefore, the SQL cannot be run.

There is a workaround: by adding the same dimension in the --group-by like this:

mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name,city_id__country_id__country_code

the output SQL will have the necessary joins and the WHERE clause becomes valid.

Expected Behavior

MetricFlow should be able to recognise dimensions in --where clauses and add the necessary joins.

At the moment the WHERE clause can be arbitrary and compiled of multiple conditions, even complex things with brackets and whatnot. I agree that this solution allows for the most flexibility and the option should remain. However, there should also be a more controlled way of adding where clauses that relies on the semantics that we have defined and makes sure the outcome is valid SQL.

Steps To Reproduce

I have not tested if the issue only happens due to this particular dimension needing two joins (to dim_city and to dim_country). I suspect not, because the output SQL did not even have a join to dim_city either.

To reproduce, it should be enough to try a --where clause with a dimension and without a --group-by for the same dimension.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt: 1.8
- metricflow:

Which database are you using?

other (mention it in "Additional Context")

Additional Context

spark/databricks

siljamardla avatar Sep 05 '24 06:09 siljamardla