graphql icon indicating copy to clipboard operation
graphql copied to clipboard

When querying only for `totalCount` within a `*Connection` query, the generated cypher queries and returns too much data resulting in serious performance-issues

Open andreloeffelmann opened this issue 9 months ago • 5 comments

Describe the bug When querying only for totalCount within a *Connection query, database-internal data is returned for each found node-instance resulting in a HUGE response. This totally kills the performance when having many nodes. See the steps to reproduce. When running this query on our production database with Neo4j Desktop, the app completely dies and shuts down. And we only have around 2m nodes of this type. We have other types with ~80m nodes. Don't want to know what would happen if I executed this query on that type...

Type definitions

type Norm {
  code: String!
}

To Reproduce

  1. Run a server with the type defs
  2. Execute the following Query:
query {
  normsConnection {
    totalCount
  }
}
  1. The following cypher is generated
MATCH (this0:Norm)
WITH collect({ node: this0 }) AS edges
WITH edges, size(edges) AS totalCount
CALL {
  WITH edges
  UNWIND edges AS edge
  WITH edge.node AS this0
  RETURN collect({ node: { __id: id(this0), __resolveType: "Norm" } }) AS var1
}
RETURN { edges: var1, totalCount: totalCount } AS this
  1. This results in a response with the following structure:
{
  "edges": [
    {
      "node": {
        "__resolveType": "Norm",
        "__id": 42942
      }
    },
    {
      "node": {
        "__resolveType": "Norm",
        "__id": 47617
      }
    },
  ....
  ],
  "totalCount": 185
}
  1. The sub-object node with the keys __resolveType and __id is returned for EACH found instance.

Expected behavior When querying on a connection including only totalCount, the irrelevant meta-information should NOT be returned to keep perfomance. Even to query for is a perfomance killer, I think. A simple cypher to aim for the count-store only should be sufficient:

MATCH (this0:Norm)
RETURN {totalCount:count(this0)}

System

andreloeffelmann avatar Mar 26 '25 08:03 andreloeffelmann

Hi @andreloeffelmann

totalCount is generally intended to be used as part of a paginated result. For the case you mention, try using Aggregation Operations which should not return the edges, unless requested. Another alternative is to use pagination to return less edges (totalCount will still be the total count of edges)

This is still a valid performance improvement that should be addressed

angrykoala avatar Mar 26 '25 11:03 angrykoala

Hi @angrykoala,

according to Aggregation Operations I have just tried the following:

normsConnection {
    aggregate {
      count {
        nodes
      }
    }
}

This results in a cypher query even worse:

CALL {
    MATCH (this:Norm)
    RETURN { nodes: count(DISTINCT this) } AS var0
}
CALL {
    WITH *
    MATCH (this1:Norm)
    WITH collect({ node: this1 }) AS edges
    WITH edges, size(edges) AS totalCount
    CALL {
        WITH edges
        UNWIND edges AS edge
        WITH edge.node AS this1
        RETURN collect({ node: { __id: id(this1), __resolveType: "Norm" } }) AS var2
    }
    RETURN var2, totalCount
}
RETURN { edges: var2, totalCount: totalCount, aggregate: { count: var0 } } AS this

Now in addition to the aggregate, the totalCount and the edges are queried, resulting in this response:

{
  "aggregate": {
    "count": {
      "nodes": 276
    }
  },
  "edges": [
    {
      "node": {
        "__resolveType": "Norm",
        "__id": 466073
      }
    },
    {
      "node": {
        "__resolveType": "Norm",
        "__id": 466074
      }
    },
....
  ],
  "totalCount": 276
}

andreloeffelmann avatar Mar 26 '25 12:03 andreloeffelmann

Hi @andreloeffelmann Sorry, my bad, this last query has a bug that causes that extra edges to be returned. Until that is fixed you can use the deprecated aggregate queries (I've just confirmed it doesn't return the edges) in version 6:

normsAggregate {
      count {
        nodes
    }
}

This one is deprecated and will be replaced by the query you tried in version 7 once we have fixed this issue.

Thanks for your patience on this one

angrykoala avatar Apr 04 '25 08:04 angrykoala

This issue is now fixed for the queries such as:

normsConnection {
    aggregate {
      count {
        nodes
      }
    }
}

It will be available on the next 6.x release, I'll reopen this issue as it is still not solved for totalCount

angrykoala avatar Apr 04 '25 15:04 angrykoala

Hi @angrykoala , thanks for the fast fix! I can confirm that counting with aggregate within a connection query now performs well :) As soon as totalCount is also considered, this can be closed I guess

andreloeffelmann avatar Apr 08 '25 11:04 andreloeffelmann

Hi @angrykoala due to #6206 we cannot use

aggregate {
      count {
        nodes
      }
}

on all our types. But totalCount is not working either because we face the perfomance issues reported above. So... can you fix either this or #6206? brgds

andreloeffelmann avatar Jul 29 '25 09:07 andreloeffelmann

Hi @andreloeffelmann

Would it be possible to apply the workaround in #6206 ?

The underlying cause of #6206 is a variable name conflict, which may affect other queries, not only aggregations, so just fixing this one will likely not be enough long term

I'll discuss #6206 with the team to see if we can find a fix for these naming conflicts

angrykoala avatar Jul 29 '25 10:07 angrykoala

Hi @angrykoala

unfortunately the workaround in #6206 is not feasible for us since we have to follow strict naming conventions.

How about fixing the perfomance issues for totalCount which are described in this ticket?

andreloeffelmann avatar Aug 05 '25 07:08 andreloeffelmann

Hi @andreloeffelmann Yeah, I'll be looking at this one as well, I'm first checking #6206 as its root cause will cause other issues, not just in aggregation.

I'll come back to you shortly with my findings on that one and will take a look at this as well, thanks for your patience 🙏

angrykoala avatar Aug 05 '25 08:08 angrykoala

Hi @andreloeffelmann

#6206 is quite tricky, as it may cause unexpected breaking changes if fixed for your use case. In the meantime, I've made a patch to fix this performance issue, along with another small query optimization, should be available in the next release soon

angrykoala avatar Aug 06 '25 10:08 angrykoala

Hi @angrykoala thanks for fixing this!

andreloeffelmann avatar Aug 12 '25 14:08 andreloeffelmann