Fix #23449

Background

How we go from simple MBQL query -> native query

Let's go through the life of a query and how it traverses through some of our middleware. Our concern here is how we get from a simple MBQL query, to a full native query that knows which fields to select, and which tables to select from.

When we create a query that uses a saved questions/model as a card, the query will look like this:

{
  :query {
    :source-table "card__6"
  }
}

The add-source-metadata-for-source-queries middleware will find the card__6, and update the query with the metadata of the source query

{
  :query {
-  :source-table "card__6",
+  :source-card-id 6,
+  :source-query {:source-table 4},
+  :source-metadata    ;; The metadata from the source questions
+  [{:description "A unique internal identifier for the review. Should not be used externally.",
+    :semantic_type :type/PK,
+    :table_id 4,
+    :coercion_strategy nil,
+    :name "ID",
+    :settings nil,
+    :source :fields,
+    :field_ref [:field 36 nil],
+    :effective_type :type/BigInteger,
+    :nfc_path nil,
+    :parent_id nil,
+    :id 36,
+    :visibility_type :normal,
+    :display_name "ID",
+    :fingerprint nil,
+    :base_type :type/BigInteger}
+   ...
+   ]}
}

Because we don't specify which fields we're selecting, the add-implicit-clauses will assume that we select all available columns in the source questions. It basically will loop the source-metadata and create a fields array.

{
  :query {:source-card-id 6,
  :source-query {:source-table 4},
+  :fields
+   [[:field 36 nil]
+    [:field 33 nil]
+    ...]
  :source-metadata  ;; The metadata from the source questions
  [{:description "A unique internal identifier for the review. Should not be used externally.",
    :semantic_type :type/PK,
    :table_id 4,
    :coercion_strategy nil,
    :name "ID",
    :settings nil,
    :source :fields,
    :field_ref [:field 36 nil],
    :effective_type :type/BigInteger,
    :nfc_path nil,
    :parent_id nil,
    :id 36,
    :visibility_type :normal,
    :display_name "ID",
    :fingerprint nil,
    :base_type :type/BigInteger}
   ...
   ]}
}

From here, the mbql-to-native middleware will generate a query that:

select fields defined in the fields array
from the source query

How we deal with remapping columns

When query a field that has a remapping, we need to do 2 things:

returning an additional column in the result that contains the mapped values
letting FE knows which is the original column, and which column is the mapped column so that FE could display accordingly

This is done via the remap-results middleware, it'll:

Add a new column to the result with values are the remapped values of the original column
Add a remapped_to property to the metadata of the original field
Add a new metadata field for the remapped column which has a special remapped_from property

{
  :metadata  ;; The metadata from the source questions
  [
    {:description "The rating (on a scale of 1-5) the user left.",
     :table_id 4,
     :name "RATING",
     :field_ref [:field 31 nil],
     :effective_type :type/Integer,
+    :remapped_to "Rating",
     :id 31,
     :display_name "Rating",
     }
+   {:description nil,
+    :semantic_type nil,
+    :table_id nil,
+    :name "Rating",
+    :remapped_from "RATING",
+    :remapped_to nil,
+    :id nil,
+    :target nil,
+    :display_name "Rating",
+    :base_type :type/Text}
]}

With the remapped_from and remapped_to property, FE could figure which is the original column, which is the remapped column.

Back to our problem

The problem with #23449 is that the generated query selects a field that does not exist in the source query

SELECT id, rating, rating_2, ;; rating_2 does not exist in the source query
FROM (
  SELECT id, rating
  FROM reviews
)

This is because our source question is a question with remapped column, thus the the metadata of that question will contains 2 entries for the same rating field. One for the original field, another one for the remapped columns that remap-results added.

This led to the add-implicit-columns middleware will add 2 fields for rating column

  :fields
   [[:field 36 nil]           # original rating
    [:field "Rating" nil]  # remapped values of rating
    ...]

My solution

In the add-implicit-clauses middleware, instead of inferring a fields array based on all fields in the source metadata, filter out any fields that are remapped fields.

For how to test, please refer to #23449.

P/s: I'll add tests once someone could take a look if I'm heading in the right direction.

Aug 04 '22 12:08 qnkhuat

Codecov Report

Merging #24589 (3ea2334) into master (c41ecda) will decrease coverage by 0.52%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #24589      +/-   ##
==========================================
- Coverage   65.19%   64.66%   -0.53%     
==========================================
  Files        2831     2873      +42     
  Lines       85700    86646     +946     
  Branches    10816    11030     +214     
==========================================
+ Hits        55869    56031     +162     
- Misses      25398    26167     +769     
- Partials     4433     4448      +15

Flag	Coverage Δ
back-end	`85.82% <100.00%> (+<0.01%)`	:arrow_up:
front-end	`45.75% <ø> (-0.62%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...se/query_processor/middleware/results_metadata.clj	`97.61% <100.00%> (+0.32%)`	:arrow_up:
...ses/components/DatabaseEditApp/Sidebar/Sidebar.jsx	`70.37% <0.00%> (-24.08%)`	:arrow_down:
frontend/src/metabase/static-viz/lib/numbers.ts	`81.25% <0.00%> (-18.75%)`	:arrow_down:
...tend/src/metabase/containers/SaveQuestionModal.jsx	`78.43% <0.00%> (-15.51%)`	:arrow_down:
...ntend/src/metabase/core/components/Input/Input.tsx	`66.66% <0.00%> (-13.34%)`	:arrow_down:
frontend/src/metabase-lib/lib/metadata/Table.ts	`80.64% <0.00%> (-11.95%)`	:arrow_down:
...z/components/LineAreaBarChart/LineAreaBarChart.tsx	`12.50% <0.00%> (-9.73%)`	:arrow_down:
.../visualizations/components/settings/ColumnItem.jsx	`90.90% <0.00%> (-9.10%)`	:arrow_down:
...izations/components/ChartSettingsWidget.styled.tsx	`76.92% <0.00%> (-6.42%)`	:arrow_down:
frontend/src/metabase-lib/lib/Mode/utils.ts	`64.00% <0.00%> (-5.57%)`	:arrow_down:
... and 136 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Aug 04 '22 12:08 codecov[bot]

It is not super clear how this works to me. Here's what I'm thinking we have to do in the problem space: Since these fields are actually created in "user space" after the query runs, they can only happen top level. We don't emit any case logic inside the query, we transform them in Clojure. So we really have to never insert these extra columns unless at the top level the column makes its way up and then we add an extra column (note: this seems like a weird way to do it. I wonder what design constraints existed such that an extra column was chosen over annotating the existing column to be interpreted). Does that make sense?

An example where this fix still breaks:

Do the remapping for orders.reviews, mapping the rating to some letters a, b, c, d, etc
make a question with limit 2 (just to keep things in the repl tidy) and save it
Make a new question based on products and join to the question you just saved on product.id = question.product_id
since we limited it to 2 rows, set a filter where question.product_id is not null

I think this is happening because where you are filtering out the remapped fields is kinda arbitrary. You filter them out in source-metadata->fields which sounds right but that branch is taken only when there isn't a source-table-id on a part of a query.

That query looks like this:

(qp/process-query {:type :query,
                   :query {:source-table 1, ;; products
                           :joins [{:fields "all",
                                    :source-table "card__1",
                                    :condition ["="
                                                ["field" 30 nil] ;; products.id
                                                ["field" 33 {:join-alias "Question 1"}]], ;; reviews.product.id
                                    :alias "Question 1"}],
                           :limit 2},
                   :database 1,
                   :parameters []})

And it is emitting this sql:


 SELECT "PUBLIC"."products"."id"         AS "ID",
       "PUBLIC"."products"."ean"        AS "EAN",
       "PUBLIC"."products"."title"      AS "TITLE",
       "PUBLIC"."products"."category"   AS "CATEGORY",
       "PUBLIC"."products"."vendor"     AS "VENDOR",
       "PUBLIC"."products"."price"      AS "PRICE",
       "PUBLIC"."products"."rating"     AS "RATING",
       "PUBLIC"."products"."created_at" AS "CREATED_AT",
       "Question 1"."id"                AS "Question 1__ID",
       "Question 1"."product_id"        AS "Question 1__PRODUCT_ID",
       "Question 1"."reviewer"          AS "Question 1__REVIEWER",
       "Question 1"."rating"            AS "Question 1__RATING",
       "Question 1"."body"              AS "Question 1__BODY",
       "Question 1"."created_at"        AS "Question 1__CREATED_AT",
       "Question 1"."Rating"            AS "Question 1__Rating_2" -- picked up the fake column
FROM   "PUBLIC"."products"
       LEFT JOIN (SELECT "PUBLIC"."reviews"."id"         AS "ID",
                         "PUBLIC"."reviews"."product_id" AS "PRODUCT_ID",
                         "PUBLIC"."reviews"."reviewer"   AS "REVIEWER",
                         "PUBLIC"."reviews"."rating"     AS "RATING",
                         "PUBLIC"."reviews"."body"       AS "BODY",
                         "PUBLIC"."reviews"."created_at" AS "CREATED_AT"
                  FROM   "PUBLIC"."reviews"
                  LIMIT  2) "Question 1"
              ON "PUBLIC"."products"."id" = "Question 1"."product_id"
LIMIT  2

Aug 11 '22 18:08 dpsutton

After looking at this problem again I came up with a new solution for this, please check the updated "My solution" in the description.

Aug 17 '22 15:08 qnkhuat

I'm wondering how this would affect this situation: A base question with a remapped column. Let's use the rating mapped to A, B, etc.

Question 1 is a saved question pseudo-sql: select id, rating, reviewer from reviews where created_at in 'March' That would have metadata id, rating (mapped-from), rating (mapped to), reviewer.

If you do a nested query on this question, select * from question__1 where reviewer = 'sue'

I would expect to see results like the following in the view:

id  |   Rating    | reviewer
2   |    A        | Sue
7   |    B        | Sue

But if we remove the metadata from the save step I don't think this is possible. And if we don't have it on the question's metadata the FE won't know about it.

I think some we need to be clear about metadata on query creation and query post-processing. I agree we want all of these pseudo-columns removed when we are building the query. But on the post processing side, I think we want this information to trickle up to the top level, and then if a remapped column is present "at the top level" we want to insert the remapped values right?

-- Preprocessing
select * from question__1 where reviewer = 'sue'
-- becomes
select id, rating, reviewer  -- fields from question 1 metadata with
                             -- pseudo columns removed, because those don't exist
                             -- in sql
  from (select id, rating reviewer
        from products
        where created_at = 'March')
where reviewer = 'sue'

-- on post processing side
-- we annotate with field information and it "bubbles up"
-- see `add-column-info`

-- And if we see a remapped column ends up at the "top level"
-- we should remap the result

-- metadata [id, rating, reviewer]

select id, rating, reviewer  -- these columns are identified with the columns from
                             -- the subselect id, rating (remapped_from)
                             -- and the remapped_from should end up on top level rating
  from (select id, rating reviewer
        from products
        where created_at = 'March')
where reviewer = 'sue'

-- so sequence of values
-- [2, 5, 'Sue'] with metadata [id, rating, reviewer]
-- becomes
-- [2, 5, 'A', 'Sue'] with metadata [id, rating, Rating, reviewer]
-- in `remap-results`

Aug 25 '22 19:08 dpsutton

I would expect to see results like the following in the view:

Let me start my answer by assuring you that we still got this.

I have an e2e test in test here to demonstrate that, you can see that it still returns "$$$" in the result.

But on the post-processing side, I think we want this information to trickle up to the top level, and then if a remapped column is present "at the top level" we want to insert the remapped values right?

Actually, we do not :D.

In order to remap columns (no matter nested or not), we don't rely on the fact a column was "annotated" that it should be remapped.

Instead, at remap-results middleware (a post-processing middleware), we look into all the selected columns and find any columns that "should" have a remapping.

https://github.com/metabase/metabase/blob/fa6ee214e9b8d2fb4cccf4fc88dc1701face777b/src/metabase/query_processor/middleware/add_dimension_projections.clj#L432-L434

We know if a column has a remapping if there is a Dimension for that field. And if there is a remapping column, we add the values to each row in

https://github.com/metabase/metabase/blob/fa6ee214e9b8d2fb4cccf4fc88dc1701face777b/src/metabase/query_processor/middleware/add_dimension_projections.clj#L450

On FE side, not saving remapped fields in results_metadata doesn't affect it because FE relies on [:data :cols] to know which columns should be remapped to which. And we still return all the columns -- including the psuedo-columns in [:data :cols]

P/s: the flow I described above applies only for remapping using custom values, for remapping using an FK key, we have a different approach.

Aug 26 '22 04:08 qnkhuat

metabase
metabase copied to clipboard

Filter out remapped fields when getting fields from source-metadata

Background

How we go from simple MBQL query -> native query

How we deal with remapping columns

Back to our problem

My solution

Codecov Report

metabase metabase copied to clipboard

Filter out remapped fields when getting fields from source-metadata

Background

How we go from simple MBQL query -> native query

How we deal with remapping columns

Back to our problem

My solution

Codecov Report

metabase
metabase copied to clipboard