graphql-engine
graphql-engine copied to clipboard
Count aggregate that takes multiple columns does not compute correct results
Version Information
Server Version: 2.8.4
Environment
OSS
What is the current behaviour?
For a Postgres data source, if you issue a GraphQL aggregate query using a count aggregation function that takes multiple columns, such as:
query {
Invoice_aggregate {
aggregate {
count(columns: [BillingState, BillingPostalCode])
}
}
}
this returns a count of all rows in the Invoice table. This is incorrect.
What is the expected behaviour?
The expected behaviour is to return a count of all rows where either/both BillingState
and BillingPostalCode
are not null. (ie. if both are null, then the row is not counted).
Please provide any traces or logs that could help here.
The SQL generated for the above query is:
SELECT
json_build_object(
'aggregate',
json_build_object(
'count',
COUNT(("BillingState", "BillingPostalCode"))
)
) AS "root"
FROM
(
SELECT
"_0_root.base"."BillingState" AS "BillingState",
"_0_root.base"."BillingPostalCode" AS "BillingPostalCode"
FROM
(
SELECT
*
FROM
"public"."Invoice"
WHERE
('true')
) AS "_0_root.base"
) AS "_1_root"
The problem is that COUNT is being passed a tuple of BillingState and BillingPostalCode, and this tuple is never null (regardless of if the columns themselves are). So all rows are counted.
Any possible solutions?
Instead of tupling the columns, pass them to the COALESCE
function. However, this would produce rather incoherent behaviour when distinct
is also set to true on the count field.
I think the best "solution" would simply be removing the ability to count across multiple columns. It is not supported by the SQL Server backend already, and is broken in Postgres right now. If you want to count rows that are not null across multiple columns, use a filter in the GraphQL and a plain count.
you can just use the where clause inside Invoice_aggregate,
Invoice_aggregate(where: { BillingState: { _neq: null }, BillingPostalCode: { _neq: null } })
Closed with above answer.
@seanparkross I don't think this should be closed. The suggested solution is a workaround and doesn't solve the issue with the count aggregate.
This issue was opened to highlight the fact that one of our features doesn't actually work as intended. Just because I can compute the result using a different feature doesn't mean we shouldn't (at some point) resolve the actual issue with count.