superset icon indicating copy to clipboard operation
superset copied to clipboard

TOTALS incorrect for non-additive metric in pivot table

Open lameapparition opened this issue 2 years ago • 12 comments

Hello, I'm new into Superset. Trying to move our analytics to this platform, but i met problem as I see no possibility of making Total row correct for non-countable (or non-additive) metric SuperSet version: 3.0.0

Example: I have three metrics grouped by month:

  1. "spend plan" – amount of money company was planning to spend
  2. "spend fact" – amount of money company actually spended
  3. "spend rate" – ratio of spend fact and spend plan So, I created pivot table and added totals. "spend plan" and "spend fact" are ok – its totals are sums of all values but for "spend rate" I also see sums of all values instead of ratio of "spend fact" and "spend plan" totals image image

Expected result (using Google Spreadsheets): image Maybe I'm missing something, hope to get help

UPD. Found that usual table calculates "spend rate" correctly image

Originally posted by @lameapparition in https://github.com/apache/superset/issues/14061#issuecomment-1777176807

lameapparition avatar Oct 24 '23 13:10 lameapparition

This is a known limitation right now. The solution would be to trigger additional queries for all aggregation levels. By doing so even non-additive metrics would display correctly (this is how the table chart does it). However, the required change is rather complex, and requires good knowledge of how the pivot table chart works. IIRC I've discussed this with @kgabryje a few times a while ago (he's the original author of the chart plugin), but we were never able to find the time to work on implementing this improvement.

villebro avatar Oct 26 '23 20:10 villebro

There'd be two main ways to approach this:

  • having different semantics around formulas and an engine that's formula-aware, meaning it could pushdown to SQL or render on the backend/frontend. This may work for ratios or averages, but not for distinct counts or percentiles
  • running multiple queries, one for each aggregation-level. This can get complex if service into sub-totals and multiple dimensions

Both of those would be significant work and add much more complexity to the current SQL-oriented approach.

It could be good though to hide totals for non-additive metrics though, probably through per-metric inference or configuration. If it's a simple metric, we know which aggregation methods are additive and/or why. If it's SQL expression, the user may have to tell us if it's additive or not. A simple quick-fix would be to only show the totals where people use the SUM-type simple metric (which is probably 80%+ of metrics out there).

Interestingly, for things like average we could be smarter, and for things like ratio (ask the db for a SUM(col) and a COUNT(col)) and do our own math. For ratio we could introduce new "not-so-simple" metrics where you specify your numerator metric and denominator metric and we could compute totals properly....

mistercrunch avatar Feb 05 '24 23:02 mistercrunch

Hey all,

Just FYI, we have had quite a few issues on this topic. I've closed out a couple of the other ones in favor of this one. lease feel free to do they same as they arise!

rusackas avatar Jun 03 '24 17:06 rusackas

Hi!

Any update for this? I working in a table and I found same issue about the totals.

Thanks in advance...

felipegranado avatar Jul 11 '24 21:07 felipegranado

Bumping this because I believe that the ability to aggregate non-additive data is crucial for any BI tool. All of the major tools present on the market can handle that effortlessly.

As we define SQL for measures (and they should be proper SQL aggregates) in dataset options, I think that Superset already has all the required information to calculate subtotals. So,

  • For each aggregation level, the pivot table could send a query to the database with properly generated GROUP BY.
  • In chart settings, the dropdown that allows to choose aggregation type can be safely removed.

This approach has an obvious drawback of spamming queries if there is a lot of aggregation levels. But realistically no pivot table would have even a dozen of them, so such an overhead should be tolerable.

@mistercrunch @rusackas @villebro Could we expect such a feature in one of the upcoming releases?

goldjee avatar Aug 09 '24 07:08 goldjee

A better/easier way to implement this ( and that will add an additional feature ) is to implement weighted sum. Here in this case while calculating sum ( for rations and % columns ) we need to weight by the value of denominator column and this weighed sum needs to be normalized/divided by sum of denominator column.

This will effectively give sum(numerator)/sum(denominator)

E.g.

  Numerator Denominator Ratio Weight (=Denominator) Weight X Ratio ( Hidden )
  10 40 0.25 40 10
  3 80 0.0375 80 3
  30 50 0.6 50 30
  40 80 0.5 80 40
  35 100 0.35 100 35
Totals 118 350 1.7375 350 118

The total of ratio is 1.7375 is current superset behavior. If we implement weighted sum it will be sum of weight*ratio / sum of weights = 118/350 = 0.33714286

From GUI we need to save a dict of which column(s) sum is/are weighed and what is/are the corresponding weight column(s).

Also this will not need additional db hits. We work with the same data we got in original query.

ramrajsingh avatar Oct 30 '24 09:10 ramrajsingh

We thought about implementing a mini-language for metrics in the past as some other BI tools do as it would serve other use cases beyond this one. Say if you define a complex metric as a long SUM(CASE WHEN {...} END) and would like to reuse it in a ratio or another metric for whatever reason, currently you have to duplicate the logic which is ok but not ideal.

One approach would be to use the jinja support in place by adding some sort of macro for say {{ dataset.metrics("my_numerator") }} / {{ dataset.metrics("my_numerator") / }}, but this wouldn't work for in-frontend-type computation like the use case we're exploring here.

If we were to come up with a higher level / declarative mini-language for metrics, we'd have the luxury to decide whether to compute on the backend or frontend depending on the use case. Not sure if there are good libs that would work well for this. On the top of my head, it'd have to support metric referencing, inline sql expressions, aggregation semantics (SUM, COUNT_DISTINCT, AVG, ...), and common arithmetic operations, and be interpretable in both javascript and python.

mistercrunch avatar Oct 30 '24 16:10 mistercrunch

@mistercrunch Are there any updates on this feature yet? Can you please suggest a work around if available

rishika-ushur avatar Feb 11 '25 15:02 rishika-ushur

Hi, is there any update on this? Because it's a pretty basic behaviour for a BI tool

MarcoFranchin96 avatar Feb 14 '25 13:02 MarcoFranchin96

Really looking forward to see when this will be solved. I am moving from Tableau mainly because I grew tired of how heavy Prep is, and how long it takes to clean all my data sets. Also because is expensive... But net being able to have a pivot table that subtotals both sum and avg is a major issue. In my current setup in Superset I ended up creating 2 tables, one for $$$ and the other one for % but is a pain for layout purposes in the dashboard.

mhgapps avatar Feb 27 '25 15:02 mhgapps

Is it possible to use custom plugin for chart or smth like that? It's pretty strange, that there're no solution yet...

Valerii-Tcvetkov avatar Mar 12 '25 10:03 Valerii-Tcvetkov

This just hit my potential-stale-issue radar, but obviously it's still relevant. I'm not sure whether or not some of the newer solutions @yousoph has been looking into on the Preset side have taken this into account or not.

rusackas avatar Jun 10 '25 03:06 rusackas

Maybe the difference between SuperSet and some other platforms is:

  • SuperSet calculates totals on the base of rows (vertical path)
  • Other calculate totals on the base of columns using the formula for the row (horizontal path) Example: row 1 row2 row3 (SUM(row2)/SUM(row1) *100) SuperSet Other 1 1 100 100 10 4 40 40 SUM SUM AVG() == (SUM(row2)/SUM(row1) *100) 11 5 70 45

YuriPP avatar Jun 16 '25 09:06 YuriPP

Hi! I tried to fix this problem and have already uploaded my changes in #34592. I deleted aggregate functions and calculate totals by executing multiple, separate queries with varying GROUP BY clauses.

rafal-celinski avatar Aug 07 '25 09:08 rafal-celinski

Keep "Total (Sum)" option, but add another "Total (Overall)" option

The current implementation is actually very useful.

  • Sometimes, in Superset, it is wanted to validly SUM to get the TOTAL of whatever aggregate values were above. I have use cases of this.

But, after seeing the different "Total" usage (apply same aggregate function to all underlying data), in another tool, now I see why this issue exists. Some people want to do that, too.

Please consider NOT removing "Total (Sum)" option from Pivot Tables in Superset.

If anything, consider ALSO adding another checkbox, with another label, for the "use the same aggregate" function. This could be called simply "Total", or any of: "Total (Overall)", or "Total (Aggregate)", "Overall (Aggregate)", "Overall Agg"; or something else.

Keep what is there, but also do what other people want. Please and thank you.

u35253 avatar Nov 20 '25 17:11 u35253