When querying only for `totalCount` within a `*Connection` query, the generated cypher queries and returns too much data resulting in serious performance-issues
Describe the bug
When querying only for totalCount within a *Connection query, database-internal data is returned for each found node-instance resulting in a HUGE response. This totally kills the performance when having many nodes.
See the steps to reproduce. When running this query on our production database with Neo4j Desktop, the app completely dies and shuts down. And we only have around 2m nodes of this type. We have other types with ~80m nodes. Don't want to know what would happen if I executed this query on that type...
Type definitions
type Norm {
code: String!
}
To Reproduce
- Run a server with the type defs
- Execute the following Query:
query {
normsConnection {
totalCount
}
}
- The following cypher is generated
MATCH (this0:Norm)
WITH collect({ node: this0 }) AS edges
WITH edges, size(edges) AS totalCount
CALL {
WITH edges
UNWIND edges AS edge
WITH edge.node AS this0
RETURN collect({ node: { __id: id(this0), __resolveType: "Norm" } }) AS var1
}
RETURN { edges: var1, totalCount: totalCount } AS this
- This results in a response with the following structure:
{
"edges": [
{
"node": {
"__resolveType": "Norm",
"__id": 42942
}
},
{
"node": {
"__resolveType": "Norm",
"__id": 47617
}
},
....
],
"totalCount": 185
}
- The sub-object
nodewith the keys__resolveTypeand__idis returned for EACH found instance.
Expected behavior
When querying on a connection including only totalCount, the irrelevant meta-information should NOT be returned to keep perfomance.
Even to query for is a perfomance killer, I think.
A simple cypher to aim for the count-store only should be sufficient:
MATCH (this0:Norm)
RETURN {totalCount:count(this0)}
System
- Version: @neo4j/[email protected]
Hi @andreloeffelmann
totalCount is generally intended to be used as part of a paginated result. For the case you mention, try using Aggregation Operations which should not return the edges, unless requested. Another alternative is to use pagination to return less edges (totalCount will still be the total count of edges)
This is still a valid performance improvement that should be addressed
Hi @angrykoala,
according to Aggregation Operations I have just tried the following:
normsConnection {
aggregate {
count {
nodes
}
}
}
This results in a cypher query even worse:
CALL {
MATCH (this:Norm)
RETURN { nodes: count(DISTINCT this) } AS var0
}
CALL {
WITH *
MATCH (this1:Norm)
WITH collect({ node: this1 }) AS edges
WITH edges, size(edges) AS totalCount
CALL {
WITH edges
UNWIND edges AS edge
WITH edge.node AS this1
RETURN collect({ node: { __id: id(this1), __resolveType: "Norm" } }) AS var2
}
RETURN var2, totalCount
}
RETURN { edges: var2, totalCount: totalCount, aggregate: { count: var0 } } AS this
Now in addition to the aggregate, the totalCount and the edges are queried, resulting in this response:
{
"aggregate": {
"count": {
"nodes": 276
}
},
"edges": [
{
"node": {
"__resolveType": "Norm",
"__id": 466073
}
},
{
"node": {
"__resolveType": "Norm",
"__id": 466074
}
},
....
],
"totalCount": 276
}
Hi @andreloeffelmann Sorry, my bad, this last query has a bug that causes that extra edges to be returned. Until that is fixed you can use the deprecated aggregate queries (I've just confirmed it doesn't return the edges) in version 6:
normsAggregate {
count {
nodes
}
}
This one is deprecated and will be replaced by the query you tried in version 7 once we have fixed this issue.
Thanks for your patience on this one
This issue is now fixed for the queries such as:
normsConnection {
aggregate {
count {
nodes
}
}
}
It will be available on the next 6.x release, I'll reopen this issue as it is still not solved for totalCount
Hi @angrykoala , thanks for the fast fix! I can confirm that counting with aggregate within a connection query now performs well :) As soon as totalCount is also considered, this can be closed I guess
Hi @angrykoala due to #6206 we cannot use
aggregate {
count {
nodes
}
}
on all our types. But totalCount is not working either because we face the perfomance issues reported above.
So... can you fix either this or #6206?
brgds
Hi @andreloeffelmann
Would it be possible to apply the workaround in #6206 ?
The underlying cause of #6206 is a variable name conflict, which may affect other queries, not only aggregations, so just fixing this one will likely not be enough long term
I'll discuss #6206 with the team to see if we can find a fix for these naming conflicts
Hi @angrykoala
unfortunately the workaround in #6206 is not feasible for us since we have to follow strict naming conventions.
How about fixing the perfomance issues for totalCount which are described in this ticket?
Hi @andreloeffelmann Yeah, I'll be looking at this one as well, I'm first checking #6206 as its root cause will cause other issues, not just in aggregation.
I'll come back to you shortly with my findings on that one and will take a look at this as well, thanks for your patience 🙏
Hi @andreloeffelmann
#6206 is quite tricky, as it may cause unexpected breaking changes if fixed for your use case. In the meantime, I've made a patch to fix this performance issue, along with another small query optimization, should be available in the next release soon
Hi @angrykoala thanks for fixing this!