graphql icon indicating copy to clipboard operation
graphql copied to clipboard

Feature/optimize interfaces

Open jroith opened this issue 1 year ago • 6 comments

This is a fork that I'm currently maintaining internally, that optimised performance in certain cases. The implementations shown here is not particularly efficient, it's not fully general purpose and I'm providing it to share some ideas and because @angrykoala inquired about it in another MR in the cypher builder.

Let me try to quickly break down what is going on here.

We have a schema with many interfaces and wanted to achieve two goals:

  1. It should be possible to efficiently selecting very common interfaces in Cypher without OR-ing together a large number of labels.
  2. The library tends to unnecessarily build large UNIONs, often nested, which results in long query planning times and giants query plans that execute slowly, too.

We have addressed this for our case like this:

  1. We preprocess our schema from a meta-schema. Here we declare, amongst other things, which interfaces should get a neo4j label and which should not. We do this to strike a balance between adding to many labels for neo4j to pack them in the bitfield and between having long selectors for base interfaces.
  2. Some types can also have a composite label where the combination of labels identifies the type.
  3. We add a field mainType to certain (most) types that redundantly stores the type of the node (via a populatedBy). This is done in order to easily be able to determine the type of the node even if base interface labels are present and without having to look at the schema and more importantly without having to add a string literal in the __resolveType property for reasons that are explained below.
  4. We then add a LabelManager to the application (not the library, passing it in the context object) that knows how to build short node selectors (such as "A", "B", "B&C" or "I1&I2") given a type or interface. It does so based on the labels available for each type and looking up or down the hierarchy. This code is not included because it is not in the library and could be implemented differently.
  5. The LabelManager also indicates if the mainType property is known to be present on any specific type.

These changes are made, amongst other reasons, in order to be able to make each branch of a UNION as similar as possible and possibly identical. If an interface is queried, we can try to instead use a common label or common expression (if no label is available). Likewise we can replace __resolveType with the mainType property. This is usually enough to make the code in different UNION branches identical. If a "... on Foo" notation is used or perhaps in other cases such as authorisation or whatever, the code may differ. We then check all cases that are identical and collapse those cases to a single one and leave the other cases in place, excluding the combined ones using a predicate.

In practice this sometimes brings the original execution time down from 5 seconds to 100ms, especially due to long query planning times. A drawback is that the query generation itself is inefficient, because the query has to be built twice and recursively before being compressed and we only compare the resulting Cypher string which is robust but again not efficient. Since there is no cache in the library this is not optimal.

Nevertheless it is still much better for our cases and a negligible cost.

The patched library is a drop-in replacement because it will not have an effect unless the labelManager is present on and does not expose any new APIs.

Although I don't really have any hope that this MR will be merged, perhaps it can provide some useful ideas for the future that may help to improve the query generation for interfaces to a point where the fork is no longer necessary and can be dropped.

jroith avatar Jan 24 '24 16:01 jroith