graphql-php icon indicating copy to clipboard operation
graphql-php copied to clipboard

Determine if two queries execute identically

Open spawnia opened this issue 5 years ago • 4 comments

Summary

It would be nice if this library can provide a mechanism to determine queries that are identical regarding their execution. This is useful for performance optimizations, including but not limited to subscriptions and caching.

What problem does this solve?

This recently came up while looking for a possible optimization strategy for resolving subscriptions, see https://github.com/nuwave/lighthouse/pull/1425#issuecomment-642932378

The way we implement subscriptions - I imagine others might as well - is that we basically store the subscription queries for later execution. When the subscription is triggered, we run the stored query again and push the result to clients.

In some cases we call public subscriptions, every client subscribes to the same stream of events. In a non-GraphQL world, we would just push the same result to every client. However, the dynamic subscription queries might be unique in that they pass different arguments, request different fields, alias some fields, etc.

Our current unoptimized implementation simply runs each single query individually to ensure every client gets the exact result they asked for. For expensive queries, it might be beneficial to group clients that sent equivalent queries together, resolve their queries once and serve them the same result.

Considered Alternatives

A naive approach would be to simple compare the query string + argument, but that fails to normalize insignificant differences, such as whitespace.

The next best thing i came up with is to do a string comparison on the serialized version of the parsed query, but i feel this might still be suboptimal or not cover some edge cases.

spawnia avatar Jun 15 '20 18:06 spawnia

Interesting stuff!

Did you try series of parse and print to "normalize" the query string (with regards to whitespaces)?

As for full-featured comparison... I think it is possible to write a "generic" tool that will inline all the fragments, deduplicate identical fields, and then compare each field in the selection set.

This is doable using visitor. But sounds like a rather big utility and will definitely have a bunch of edge cases (directives, aliases, unions, etc).

vladar avatar Jun 19 '20 08:06 vladar

Did you try series of parse and print to "normalize" the query string (with regards to whitespaces)?

That would be a part of a proper tool for sure, but does not take variables into account. In order for a query to be considered identical, they would have to match, too.

inline all the fragments, deduplicate identical fields, and then compare each field in the selection set.

Good thinking. There are multiple levels of optimization possible, depending on where we want to draw the boundary.

For example, we might want to normalize field ordering. That enables us to group queries that have the same fields, resolve them once, then simply reorder the results individually.

I think we can start with something relatively simple: queries whose results are exactly the same (as in: the server can send back the exact same JSON response).

spawnia avatar Jun 19 '20 10:06 spawnia

My instinct is to start out with fundamental building blocks, such as a function to serialize query strings into a normalized form. We might serialize the AST for quick rehydration.

spawnia avatar Jun 19 '20 10:06 spawnia

I had a similar question in order to try caching query results. The approach I used was to look at $info->queryPlan() in the first query field resolver that was executed by graphql-php, and use a hash of the serialized plan as a kind of unique queryId.

That in combination with a hash of the serialized $args gives you a way to cache the result of the combination of query operation + args when you get back from executeQuery() in your main handler.

For the next query when you hit that resolver, you can check if you have a cache result and return null there - stopping further resolver executions, and then replace the result with your cached result in your main handler.

public function this_query_resolver(...) {
    $resolver = function ($rootValue, $args, $context, ResolveInfo $info) use (...) {
        $queryPlan = $info->lookAhead();
        $queryId = 'this_query-' . md5(dump_query_plan($queryPlan->queryPlan());
        $cacheKey = $queryId . '-' . md5(json_encode($args));
        if (is_cached($cacheKey)) {
             return null;
        }
        // resolve normally
       ...
    };
    return $resolver;
}

$result = GraphQL::executeQuery(...);
// --> start execution until you hit the query field resolver that is actually requested
if (!empty($cacheKey)) {
    if (is_cached($cacheKey)) {
        // --> replace the null with the cached result
        $serializedResult = get_cached($cacheKey);
    } else {
        // --> cache the actual result for next time
        $serializedResult = $result->toArray();
        set_cached($cacheKey, $serializedResult);
    }
}

It would be better if we didn't need to wait until the first resolver to know which query operation, fields etc. were actually requested, but I didn't find a better entry point - at least not without adapting the graphql-php library itself.

But perhaps there's a better place for it, preferably one that could be used to override normal query execution?

mikespub avatar Feb 16 '21 16:02 mikespub