malloy icon indicating copy to clipboard operation
malloy copied to clipboard

path-less aggregates on repeated records produce unexpected results

Open mtoy-googly-moogly opened this issue 1 year ago • 4 comments

I don't yet have a reproducible test case, but here are some fragments from the Slack exchange. For some customers, the symmetric aggregate code is producing numbers which are clearly wrong. Not sure why

source: invoices is table('...') {
    dimension: flexsave_amount is (ii.price * ii.quantity) / ii.usdXchange
}

source: flexsave is from (invoices -> {
    group_by: customer_id, invoice_date_month
    aggregate: flexsave_total is sum(flexsave_amount)
})

image

ii is a repeated record, and the correct expression is aggregate: flexsave_total is ii.sum(flexsave_amount) we probably should error on this instead of writing bad sql

mtoy-googly-moogly avatar Apr 17 '23 16:04 mtoy-googly-moogly

@lloydtabb what would the test be to catch this and error on it ... is it the dimension: which references repeated records

mtoy-googly-moogly avatar Apr 17 '23 17:04 mtoy-googly-moogly

Look at all fields referenced in a SUM. Make sure the struct paths of all dimensional the entities referenced do not have "one to many joins" relative to the struct path of the sum.

I could also catch this in malloy_query...

lloydtabb avatar Apr 17 '23 17:04 lloydtabb

it would extend to any any aggregate which would resort to symmetric magic to compute the aggregation ... not sure how this should be reflected into the future function magic that @christopherswenson is working on.

mtoy-googly-moogly avatar Apr 17 '23 23:04 mtoy-googly-moogly

I think this is the same issue I ran into when trying to build a sample analysis on the GA4 public dataset. The queries in this file should provide an example where the issue is reproducible.

carlineng avatar Aug 14 '23 23:08 carlineng