metabase
metabase copied to clipboard
Filter out remapped fields when getting fields from source-metadata
Fix #23449
Background
How we go from simple MBQL query -> native query
Let's go through the life of a query and how it traverses through some of our middleware. Our concern here is how we get from a simple MBQL query, to a full native query that knows which fields to select, and which tables to select from.
When we create a query that uses a saved questions/model as a card, the query will look like this:
{
:query {
:source-table "card__6"
}
}
- The
add-source-metadata-for-source-queries
middleware will find thecard__6
, and update the query with the metadata of the source query
{
:query {
- :source-table "card__6",
+ :source-card-id 6,
+ :source-query {:source-table 4},
+ :source-metadata ;; The metadata from the source questions
+ [{:description "A unique internal identifier for the review. Should not be used externally.",
+ :semantic_type :type/PK,
+ :table_id 4,
+ :coercion_strategy nil,
+ :name "ID",
+ :settings nil,
+ :source :fields,
+ :field_ref [:field 36 nil],
+ :effective_type :type/BigInteger,
+ :nfc_path nil,
+ :parent_id nil,
+ :id 36,
+ :visibility_type :normal,
+ :display_name "ID",
+ :fingerprint nil,
+ :base_type :type/BigInteger}
+ ...
+ ]}
}
- Because we don't specify which fields we're selecting, the
add-implicit-clauses
will assume that we select all available columns in the source questions. It basically will loop thesource-metadata
and create afields
array.
{
:query {:source-card-id 6,
:source-query {:source-table 4},
+ :fields
+ [[:field 36 nil]
+ [:field 33 nil]
+ ...]
:source-metadata ;; The metadata from the source questions
[{:description "A unique internal identifier for the review. Should not be used externally.",
:semantic_type :type/PK,
:table_id 4,
:coercion_strategy nil,
:name "ID",
:settings nil,
:source :fields,
:field_ref [:field 36 nil],
:effective_type :type/BigInteger,
:nfc_path nil,
:parent_id nil,
:id 36,
:visibility_type :normal,
:display_name "ID",
:fingerprint nil,
:base_type :type/BigInteger}
...
]}
}
- From here, the
mbql-to-native
middleware will generate a query that:
- select fields defined in the
fields
array - from the
source query
How we deal with remapping columns
When query a field that has a remapping, we need to do 2 things:
- returning an additional column in the result that contains the mapped values
- letting FE knows which is the original column, and which column is the mapped column so that FE could display accordingly
This is done via the remap-results
middleware, it'll:
- Add a new column to the result with values are the remapped values of the original column
- Add a
remapped_to
property to the metadata of the original field - Add a new metadata field for the remapped column which has a special
remapped_from
property
{
:metadata ;; The metadata from the source questions
[
{:description "The rating (on a scale of 1-5) the user left.",
:table_id 4,
:name "RATING",
:field_ref [:field 31 nil],
:effective_type :type/Integer,
+ :remapped_to "Rating",
:id 31,
:display_name "Rating",
}
+ {:description nil,
+ :semantic_type nil,
+ :table_id nil,
+ :name "Rating",
+ :remapped_from "RATING",
+ :remapped_to nil,
+ :id nil,
+ :target nil,
+ :display_name "Rating",
+ :base_type :type/Text}
]}
With the remapped_from
and remapped_to
property, FE could figure which is the original column, which is the remapped column.
Back to our problem
The problem with #23449 is that the generated query selects a field that does not exist in the source query
SELECT id, rating, rating_2, ;; rating_2 does not exist in the source query
FROM (
SELECT id, rating
FROM reviews
)
This is because our source question is a question with remapped column, thus the the metadata of that question will contains 2 entries for the same rating
field. One for the original field, another one for the remapped columns that remap-results
added.
This led to the add-implicit-columns
middleware will add 2 fields for rating column
:fields
[[:field 36 nil] # original rating
[:field "Rating" nil] # remapped values of rating
...]
My solution
In the add-implicit-clauses
middleware, instead of inferring a fields
array based on all fields in the source metadata,
filter out any fields that are remapped fields.
For how to test, please refer to #23449.
P/s: I'll add tests once someone could take a look if I'm heading in the right direction.
Codecov Report
Merging #24589 (3ea2334) into master (c41ecda) will decrease coverage by
0.52%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #24589 +/- ##
==========================================
- Coverage 65.19% 64.66% -0.53%
==========================================
Files 2831 2873 +42
Lines 85700 86646 +946
Branches 10816 11030 +214
==========================================
+ Hits 55869 56031 +162
- Misses 25398 26167 +769
- Partials 4433 4448 +15
Flag | Coverage Δ | |
---|---|---|
back-end | 85.82% <100.00%> (+<0.01%) |
:arrow_up: |
front-end | 45.75% <ø> (-0.62%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...se/query_processor/middleware/results_metadata.clj | 97.61% <100.00%> (+0.32%) |
:arrow_up: |
...ses/components/DatabaseEditApp/Sidebar/Sidebar.jsx | 70.37% <0.00%> (-24.08%) |
:arrow_down: |
frontend/src/metabase/static-viz/lib/numbers.ts | 81.25% <0.00%> (-18.75%) |
:arrow_down: |
...tend/src/metabase/containers/SaveQuestionModal.jsx | 78.43% <0.00%> (-15.51%) |
:arrow_down: |
...ntend/src/metabase/core/components/Input/Input.tsx | 66.66% <0.00%> (-13.34%) |
:arrow_down: |
frontend/src/metabase-lib/lib/metadata/Table.ts | 80.64% <0.00%> (-11.95%) |
:arrow_down: |
...z/components/LineAreaBarChart/LineAreaBarChart.tsx | 12.50% <0.00%> (-9.73%) |
:arrow_down: |
.../visualizations/components/settings/ColumnItem.jsx | 90.90% <0.00%> (-9.10%) |
:arrow_down: |
...izations/components/ChartSettingsWidget.styled.tsx | 76.92% <0.00%> (-6.42%) |
:arrow_down: |
frontend/src/metabase-lib/lib/Mode/utils.ts | 64.00% <0.00%> (-5.57%) |
:arrow_down: |
... and 136 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
It is not super clear how this works to me. Here's what I'm thinking we have to do in the problem space:
Since these fields are actually created in "user space" after the query runs, they can only happen top level. We don't emit any case
logic inside the query, we transform them in Clojure. So we really have to never insert these extra columns unless at the top level the column makes its way up and then we add an extra column (note: this seems like a weird way to do it. I wonder what design constraints existed such that an extra column was chosen over annotating the existing column to be interpreted). Does that make sense?
An example where this fix still breaks:
- Do the remapping for orders.reviews, mapping the rating to some letters a, b, c, d, etc
- make a question with limit 2 (just to keep things in the repl tidy) and save it
- Make a new question based on products and join to the question you just saved on product.id = question.product_id
- since we limited it to 2 rows, set a filter where question.product_id is not null
I think this is happening because where you are filtering out the remapped fields is kinda arbitrary. You filter them out in source-metadata->fields
which sounds right but that branch is taken only when there isn't a source-table-id
on a part of a query.
That query looks like this:
(qp/process-query {:type :query,
:query {:source-table 1, ;; products
:joins [{:fields "all",
:source-table "card__1",
:condition ["="
["field" 30 nil] ;; products.id
["field" 33 {:join-alias "Question 1"}]], ;; reviews.product.id
:alias "Question 1"}],
:limit 2},
:database 1,
:parameters []})
And it is emitting this sql:
SELECT "PUBLIC"."products"."id" AS "ID",
"PUBLIC"."products"."ean" AS "EAN",
"PUBLIC"."products"."title" AS "TITLE",
"PUBLIC"."products"."category" AS "CATEGORY",
"PUBLIC"."products"."vendor" AS "VENDOR",
"PUBLIC"."products"."price" AS "PRICE",
"PUBLIC"."products"."rating" AS "RATING",
"PUBLIC"."products"."created_at" AS "CREATED_AT",
"Question 1"."id" AS "Question 1__ID",
"Question 1"."product_id" AS "Question 1__PRODUCT_ID",
"Question 1"."reviewer" AS "Question 1__REVIEWER",
"Question 1"."rating" AS "Question 1__RATING",
"Question 1"."body" AS "Question 1__BODY",
"Question 1"."created_at" AS "Question 1__CREATED_AT",
"Question 1"."Rating" AS "Question 1__Rating_2" -- picked up the fake column
FROM "PUBLIC"."products"
LEFT JOIN (SELECT "PUBLIC"."reviews"."id" AS "ID",
"PUBLIC"."reviews"."product_id" AS "PRODUCT_ID",
"PUBLIC"."reviews"."reviewer" AS "REVIEWER",
"PUBLIC"."reviews"."rating" AS "RATING",
"PUBLIC"."reviews"."body" AS "BODY",
"PUBLIC"."reviews"."created_at" AS "CREATED_AT"
FROM "PUBLIC"."reviews"
LIMIT 2) "Question 1"
ON "PUBLIC"."products"."id" = "Question 1"."product_id"
LIMIT 2
After looking at this problem again I came up with a new solution for this, please check the updated "My solution" in the description.
I'm wondering how this would affect this situation: A base question with a remapped column. Let's use the rating mapped to A, B, etc.
Question 1 is a saved question pseudo-sql: select id, rating, reviewer from reviews where created_at in 'March'
That would have metadata id, rating (mapped-from), rating (mapped to), reviewer
.
If you do a nested query on this question,
select * from question__1 where reviewer = 'sue'
I would expect to see results like the following in the view:
id | Rating | reviewer
2 | A | Sue
7 | B | Sue
But if we remove the metadata from the save step I don't think this is possible. And if we don't have it on the question's metadata the FE won't know about it.
I think some we need to be clear about metadata on query creation and query post-processing. I agree we want all of these pseudo-columns removed when we are building the query. But on the post processing side, I think we want this information to trickle up to the top level, and then if a remapped column is present "at the top level" we want to insert the remapped values right?
-- Preprocessing
select * from question__1 where reviewer = 'sue'
-- becomes
select id, rating, reviewer -- fields from question 1 metadata with
-- pseudo columns removed, because those don't exist
-- in sql
from (select id, rating reviewer
from products
where created_at = 'March')
where reviewer = 'sue'
-- on post processing side
-- we annotate with field information and it "bubbles up"
-- see `add-column-info`
-- And if we see a remapped column ends up at the "top level"
-- we should remap the result
-- metadata [id, rating, reviewer]
select id, rating, reviewer -- these columns are identified with the columns from
-- the subselect id, rating (remapped_from)
-- and the remapped_from should end up on top level rating
from (select id, rating reviewer
from products
where created_at = 'March')
where reviewer = 'sue'
-- so sequence of values
-- [2, 5, 'Sue'] with metadata [id, rating, reviewer]
-- becomes
-- [2, 5, 'A', 'Sue'] with metadata [id, rating, Rating, reviewer]
-- in `remap-results`
I would expect to see results like the following in the view:
Let me start my answer by assuring you that we still got this.
I have an e2e test in test here to demonstrate that, you can see that it still returns "$$$" in the result.
But on the post-processing side, I think we want this information to trickle up to the top level, and then if a remapped column is present "at the top level" we want to insert the remapped values right?
Actually, we do not :D.
In order to remap columns (no matter nested or not), we don't rely on the fact a column was "annotated" that it should be remapped.
Instead, at remap-results
middleware (a post-processing middleware), we look into all the selected columns and find any columns that "should" have a remapping.
https://github.com/metabase/metabase/blob/fa6ee214e9b8d2fb4cccf4fc88dc1701face777b/src/metabase/query_processor/middleware/add_dimension_projections.clj#L432-L434
We know if a column has a remapping if there is a Dimension for that field. And if there is a remapping column, we add the values to each row in
https://github.com/metabase/metabase/blob/fa6ee214e9b8d2fb4cccf4fc88dc1701face777b/src/metabase/query_processor/middleware/add_dimension_projections.clj#L450
On FE side, not saving remapped fields in results_metadata doesn't affect it because FE relies on [:data :cols]
to know which columns should be remapped to which.
And we still return all the columns -- including the psuedo-columns in [:data :cols]
P/s: the flow I described above applies only for remapping using custom values, for remapping using an FK key, we have a different approach.