graphql-engine Count aggregate that takes multiple columns does not compute correct results

Count aggregate that takes multiple columns does not compute correct results

Open daniel-chambers opened this issue 2 years ago • 1 comments

Version Information

Server Version: 2.8.4

Environment

OSS

What is the current behaviour?

For a Postgres data source, if you issue a GraphQL aggregate query using a count aggregation function that takes multiple columns, such as:

query {
  Invoice_aggregate {
    aggregate {
      count(columns: [BillingState, BillingPostalCode])
    }
  }
}

this returns a count of all rows in the Invoice table. This is incorrect.

What is the expected behaviour?

The expected behaviour is to return a count of all rows where either/both BillingState and BillingPostalCode are not null. (ie. if both are null, then the row is not counted).

Please provide any traces or logs that could help here.

The SQL generated for the above query is:

SELECT
  json_build_object(
    'aggregate',
    json_build_object(
      'count',
      COUNT(("BillingState", "BillingPostalCode"))
    )
  ) AS "root"
FROM
  (
    SELECT
      "_0_root.base"."BillingState" AS "BillingState",
      "_0_root.base"."BillingPostalCode" AS "BillingPostalCode"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Invoice"
        WHERE
          ('true')
      ) AS "_0_root.base"
  ) AS "_1_root"

The problem is that COUNT is being passed a tuple of BillingState and BillingPostalCode, and this tuple is never null (regardless of if the columns themselves are). So all rows are counted.

Any possible solutions?

Instead of tupling the columns, pass them to the COALESCE function. However, this would produce rather incoherent behaviour when distinct is also set to true on the count field.

I think the best "solution" would simply be removing the ability to count across multiple columns. It is not supported by the SQL Server backend already, and is broken in Postgres right now. If you want to count rows that are not null across multiple columns, use a filter in the GraphQL and a plain count.

Jul 26 '22 06:07 daniel-chambers

you can just use the where clause inside Invoice_aggregate,

Invoice_aggregate(where: { BillingState: { _neq: null }, BillingPostalCode: { _neq: null } })

Jul 26 '22 15:07 ilijaNL

Closed with above answer.

Aug 25 '22 12:08 seanparkross

@seanparkross I don't think this should be closed. The suggested solution is a workaround and doesn't solve the issue with the count aggregate.

This issue was opened to highlight the fact that one of our features doesn't actually work as intended. Just because I can compute the result using a different feature doesn't mean we shouldn't (at some point) resolve the actual issue with count.

Sep 01 '22 00:09 daniel-chambers

graphql-engine graphql-engine copied to clipboard

Count aggregate that takes multiple columns does not compute correct results

Version Information

Environment

What is the current behaviour?

What is the expected behaviour?

Please provide any traces or logs that could help here.

Any possible solutions?

graphql-engine
graphql-engine copied to clipboard