graphql
graphql copied to clipboard
Querying an interface produces very slow query with a lot of UNION
Describe the bug When querying an interface, neo4j generates a union for every subtypes of that interface resulting in very slow queries even if you don't request type conditions
Type definitions
interface Parent {
id: ID
label: String
}
type Child1 implements Parent {
id: ID
label: String
}
type Child2 implements Parent {
id: ID
label: String
}
...
type Child10 implements Parent {
id: ID
label: String
}
To Reproduce
Populate some nodes and relations and execute the graphql query :
query SlowQuery {
parents(where: {id: "113"}) {
id
label
}
}
Using
The generated query resembles to
CALL {
MATCH (this0:Child1 {id: $param0})
WITH this0 { .id, __resolveType: "Child1", __id: id(this0) } AS this0
RETURN this0 AS this
UNION
MATCH (this1:Child2 {id: $param1})
WITH this1 { .id, __resolveType: "Child2", __id: id(this1) } AS this1
RETURN this1 AS this
UNION
...
MATCH (this9:Child10 {id: $param9})
WITH this9 { .id, __resolveType: "Child10", __id: id(this9) } AS this9
RETURN this9 AS this
}
WITH this
RETURN this as this
Execution plan with my real world schema for the generated query (structure is identical, just names changed) looks like :
Expected behavior
This should be instantaneous since we're requesting a single node with a unique id filter however, it can takes up to several seconds to execute because of all the unions trying to successively clear for distinct values.
System (please complete the following information):
- OS: reproduced on several kind of OS
- Version: @neo4j/[email protected]
- Node.js version: 20.x.x
Many thanks for raising this bug report @Masadow. :bug: We will now attempt to reproduce the bug based on the steps you have provided.
Please ensure that you've provided the necessary information for a minimal reproduction, including but not limited to:
- Type definitions
- Resolvers
- Query and/or Mutation (or multiple) needed to reproduce
If you have a support agreement with Neo4j, please link this GitHub issue to a new or existing Zendesk ticket.
Thanks again! :pray:
This is a big problem even when using interfaces in @relationship. If we use a field type that is an interface that is implemented by around 10 types, the unions generated take a lot to resolve even if there is only one actual node in the database corresponding tot that @relationship.
I've been working on reproducing this issue with the following in Neo4j 5:
Typedefs
interface Parent {
id: String
label: String
}
type Child1 implements Parent {
id: String
label: String
}
type Child2 implements Parent {
id: String
label: String
}
type Child3 implements Parent {
id: String
label: String
}
type Child4 implements Parent {
id: String
label: String
}
type Child5 implements Parent {
id: String
label: String
}
type Child6 implements Parent {
id: String
label: String
}
type Child7 implements Parent {
id: String
label: String
}
type Child8 implements Parent {
id: String
label: String
}
type Child9 implements Parent {
id: String
label: String
}
type Child10 implements Parent {
id: String
label: String
}
Data
UNWIND range(1000) AS id
CREATE(:Child1 {id: id+"c1", label: "c1"})
CREATE(:Child2 {id: id+"c2", label: "c1"})
CREATE(:Child3 {id: id+"c3", label: "c1"})
CREATE(:Child4 {id: id+"c4", label: "c1"})
CREATE(:Child5 {id: id+"c5", label: "c1"})
CREATE(:Child6 {id: id+"c6", label: "c1"})
CREATE(:Child7 {id: id+"c7", label: "c1"})
CREATE(:Child8 {id: id+"c8", label: "c1"})
CREATE(:Child9 {id: id+"c9", label: "c1"})
CREATE(:Child10 {id: id+"c10", label: "c1"})
RETURN id
Query:
query SlowQuery {
parents(where: {id: "200c2"}) {
id
label
}
}
Is this setup accurate to your issue @Masadow ?
Trying with this setup, comparing to a query targeting the children element directly by its label (essentially the fastest way to get that element with GraphQL):
query FastQuery {
child2s(where: {id: "202c2"}) {
id
label
}
}
I noticed a difference of around 3x of the time to complete between the 2 versions for GraphQL (and ~6x in Cypher directly). Is that roughly the degradation that you experience or am I missing something in my setup that may make it worse?
It would help to know the scale of data you have roughly and what version of the database you are running on