metabase
metabase copied to clipboard
Flow :semantic-type through QP for aggregations
Fixes: #38022
Before
Before some changes to the QP, I think probably some mlv2 changes, particularly those affecting metadata calculation, we would return a :semantic_type on aggregations. Here's a screenshot of Metabase 47 (left) and master (49-RC1, but this existed in 48 as well I think) (right).
This example is showing the response from /api/dataset which runs queries for unsaved questions. Notice that in 47 the semantic type key does exist in the response but it does not exist in our latest code. This causes the sum to not be formatted correctly (there's no $ on the sum on the right).
After:
This method in metabase.lib.aggregation:
(defmethod lib.metadata.calculation/metadata-method ::aggregation
[query stage-number [_tag _opts first-arg :as clause]]
(merge
;; flow the `:options` from the field we're aggregating. This is important, for some reason.
;; See [[metabase.query-processor-test.aggregation-test/field-settings-for-aggregate-fields-test]]
(when first-arg
;; Here I also select :semantic-type from the field's metadata.
(select-keys (lib.metadata.calculation/metadata query stage-number first-arg) [:settings :semantic-type]))
((get-method lib.metadata.calculation/metadata-method :default) query stage-number clause)))
Now includes the :semantic-type from the field. This method does basically then proceed to use the :default metadata method, which I did consider changing, but I opted instead to change just the method for ::aggregation because that's where the issue manifests.
If I don't aggregate at all, the semantic_type of each column is returned as normal, which means that the metadata is being fetched/calculated correctly, it just was not being used completely on aggregations.
| Status | In Progress ↗︎ 51 / 52 |
| Commit | f578e2ededafdd1f258a29fbcdc333ae5af25afc |
| Results | ⚠️ 2 Flaky
✅ 2316 Passed
|
I think there's some more test cases to exercise here.
The argument to an aggregation and the aggregation's own result don't necessarily have the same semantic type. On the other hand I see that
metadata-method ::aggregationis what you're editing, and they all use that except:count-where, which overrides:semantic-type :type/Quantity.You'll want to check all the aggregation operators - if they take an argument, make sure they either share the
:semantic-typeof the first argument (eg.sum,avg,min,max), or that there's a specificmetadata-method(eg.:count-where).I think this might already be broken for
distinct, which should be a:type/Quantityrather than blank (now) or copied from the column (your PR).Does this make sense?
Yep, that makes sense. Are these all of the possible aggregations?
(lib.common/defop count [] [x])
(lib.common/defop cum-count [] [x])
(lib.common/defop count-where [x y])
(lib.common/defop avg [x])
(lib.common/defop distinct [x])
(lib.common/defop max [x])
(lib.common/defop median [x])
(lib.common/defop min [x])
(lib.common/defop percentile [x y])
(lib.common/defop share [x])
(lib.common/defop stddev [x])
(lib.common/defop sum [x])
(lib.common/defop cum-sum [x])
(lib.common/defop sum-where [x y])
(lib.common/defop var [x])
Yes, that's all the aggregations.
I think individually these changes are good, but I think we should take advantage of multimethod dispatch to reduce duplication here, like is done elsewhere in this file. There's really only three cases: always a :type/Quantity (eg. :count), keep the inner :semantic-type (eg. SUM), and drop :semantic-type for the unit-less :var.
See the docs if you're not familiar with derive and hierarchies. There's a few existing ones in this namespace; I think we can reuse ::count-aggregation.
(lib.hierarchy/derive ::count-aggregation ::quantity-aggregation)
(lib.hierarchy/derive :distinct ::quantity-aggregation)
(doseq [tag [:percentile :var]]
(lib.hierarchy/derive tag ::no-semantic-type))
(defmethod lib.metadata.calculation/metadata-method
::quantity-aggregation
; ...
(assoc ... :semantic-type :type/Quantity))
;; The default preserves the semantic type.
;; But for ::no-semantic-type we should drop it.
(defmethod lib.metadata.calculation/metadata-method
::no-semantic-type
; ...
(dissoc ... :semantic-type))
@adam-james-v Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?