metabase icon indicating copy to clipboard operation
metabase copied to clipboard

Flow :semantic-type through QP for aggregations

Open adam-james-v opened this issue 1 year ago • 2 comments

Fixes: #38022

Before

Before some changes to the QP, I think probably some mlv2 changes, particularly those affecting metadata calculation, we would return a :semantic_type on aggregations. Here's a screenshot of Metabase 47 (left) and master (49-RC1, but this existed in 48 as well I think) (right).

This example is showing the response from /api/dataset which runs queries for unsaved questions. Notice that in 47 the semantic type key does exist in the response but it does not exist in our latest code. This causes the sum to not be formatted correctly (there's no $ on the sum on the right).

image

After:

This method in metabase.lib.aggregation:

(defmethod lib.metadata.calculation/metadata-method ::aggregation
  [query stage-number [_tag _opts first-arg :as clause]]
  (merge
   ;; flow the `:options` from the field we're aggregating. This is important, for some reason.
   ;; See [[metabase.query-processor-test.aggregation-test/field-settings-for-aggregate-fields-test]]
   (when first-arg
     ;; Here I also select :semantic-type from the field's metadata. 
     (select-keys (lib.metadata.calculation/metadata query stage-number first-arg) [:settings :semantic-type]))
   ((get-method lib.metadata.calculation/metadata-method :default) query stage-number clause)))

Now includes the :semantic-type from the field. This method does basically then proceed to use the :default metadata method, which I did consider changing, but I opted instead to change just the method for ::aggregation because that's where the issue manifests.

If I don't aggregate at all, the semantic_type of each column is returned as normal, which means that the metadata is being fetched/calculated correctly, it just was not being used completely on aggregations.

image

adam-james-v avatar Feb 15 '24 19:02 adam-james-v

Status In Progress ↗︎ 51 / 52
Commit f578e2ededafdd1f258a29fbcdc333ae5af25afc
Results
⚠️ 2 Flaky
2316 Passed

replay-io[bot] avatar Feb 16 '24 00:02 replay-io[bot]

I think there's some more test cases to exercise here.

The argument to an aggregation and the aggregation's own result don't necessarily have the same semantic type. On the other hand I see that metadata-method ::aggregation is what you're editing, and they all use that except :count-where, which overrides :semantic-type :type/Quantity.

You'll want to check all the aggregation operators - if they take an argument, make sure they either share the :semantic-type of the first argument (eg. sum, avg, min, max), or that there's a specific metadata-method (eg. :count-where).

I think this might already be broken for distinct, which should be a :type/Quantity rather than blank (now) or copied from the column (your PR).

Does this make sense?

Yep, that makes sense. Are these all of the possible aggregations?


(lib.common/defop count       [] [x])
(lib.common/defop cum-count   [] [x])
(lib.common/defop count-where [x y])
(lib.common/defop avg         [x])
(lib.common/defop distinct    [x])
(lib.common/defop max         [x])
(lib.common/defop median      [x])
(lib.common/defop min         [x])
(lib.common/defop percentile  [x y])
(lib.common/defop share       [x])
(lib.common/defop stddev      [x])
(lib.common/defop sum         [x])
(lib.common/defop cum-sum     [x])
(lib.common/defop sum-where   [x y])
(lib.common/defop var         [x])

adam-james-v avatar Feb 16 '24 17:02 adam-james-v

Yes, that's all the aggregations.

I think individually these changes are good, but I think we should take advantage of multimethod dispatch to reduce duplication here, like is done elsewhere in this file. There's really only three cases: always a :type/Quantity (eg. :count), keep the inner :semantic-type (eg. SUM), and drop :semantic-type for the unit-less :var.

See the docs if you're not familiar with derive and hierarchies. There's a few existing ones in this namespace; I think we can reuse ::count-aggregation.

(lib.hierarchy/derive ::count-aggregation ::quantity-aggregation)
(lib.hierarchy/derive :distinct ::quantity-aggregation)

(doseq [tag [:percentile :var]]
  (lib.hierarchy/derive tag ::no-semantic-type))

(defmethod lib.metadata.calculation/metadata-method
  ::quantity-aggregation
  ; ...
 (assoc ... :semantic-type :type/Quantity))

;; The default preserves the semantic type.

;; But for ::no-semantic-type we should drop it.
(defmethod lib.metadata.calculation/metadata-method
  ::no-semantic-type
  ; ...
(dissoc ... :semantic-type))

bshepherdson avatar Feb 22 '24 16:02 bshepherdson

@adam-james-v Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

github-actions[bot] avatar Feb 22 '24 23:02 github-actions[bot]