pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add support for SUM(DISTINCT colA) aka DISTINCTSUM

Open vvivekiyer opened this issue 3 years ago • 2 comments

Currently, pinot only supports using COUNT aggregation function on DISTINCT. This is supported in two ways:

  1. DISTINCTCOUNT
  2. COUNT(DISTINCT colA)

https://github.com/apache/pinot/blob/e813867985746e916c8e898a530002551b661496/pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java#L788-L789

The ask in this issue to make pinot also support SUM aggregation function on DISTINCT. As mentioned above, this can be supported using two ways:

  1. SUM(DISTINCT colA) ---> SQL friendly way
  2. DISTINCTSUM

Similar support is offered by MySQL - https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_sum

vvivekiyer avatar Oct 12 '22 07:10 vvivekiyer

cc @walterddr @Jackie-Jiang

siddharthteotia avatar Oct 12 '22 16:10 siddharthteotia

this is not something postgres supports (which is much closer following SQL standard)

  • only COUNT(DISTINCT( <col> [, <col>]* )

however given we already support DISTINCTCOUNT, it seems ok to support DISTINCTSUM as a dialect sugar to me

walterddr avatar Oct 12 '22 17:10 walterddr

@vvivekiyer is working on this.

siddharthteotia avatar Jan 03 '23 22:01 siddharthteotia

Support has been merged for DISTINCTSUM and DISTINCTAVG for SV columns. Support for MV columns will be done in a follow-up PR.

siddharthteotia avatar Jan 11 '23 18:01 siddharthteotia

OSS issue for extending this support to MV columns https://github.com/apache/pinot/issues/10109

vvivekiyer avatar Jan 11 '23 19:01 vvivekiyer