duckdb-web icon indicating copy to clipboard operation
duckdb-web copied to clipboard

count_if: sometimes behaves like an aggregate function, but sometimes not - extra documentation welcome.

Open rpbouman opened this issue 7 months ago • 1 comments

count_if() is documented here:

https://duckdb.org/docs/sql/functions/utility#count_ifx

when we try it, a basic example suggests it's actually an aggergate function:

select count_if(x) 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

┌─────────────┐
│ count_if(x) │
│   int128    │
├─────────────┤
│           2 │
└─────────────┘

When we add y as non-aggregate, non-group by column, we also get a message we would expect if count_if were an aggregate function:

select count_if(x), y 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

Binder Error: column "y" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(y)" if the exact value of "y" is not important.

Going by these examples we might think it works as any count_if(x) could be rewritten as something like: count(case x::BOOLEAN when true then true else null end).

Despite the similarity, count_if() is not allowed as window function, and the error message in that case suggests count_if() is not an aggregate function:

select count_if(x) over () 
from (values (1, 'a'), (2, 'b'), (null, 'c') ) as t( x, y );

result:

Catalog Error: count_if is not an aggregate function

I think this behavior is sufficiently confusing to allow some extra documentation.

Alternatively, perhaps the implementation of count_if could be modified to allow it to be used just like count, i.e. the current behavior plus the ability to use it as window function. In that case it could also be moved to the aggregate functions page and window functions page.

rpbouman avatar Jul 16 '24 20:07 rpbouman