Metrics for multiplexes are a bit broken/inaccurate.
Edit: Problems #1 and #2 appear to be non-problems. See below.
As demonstrated in this PR, and in the below multiplex scenario walkthrough, timings for parsing, lexing and query duration are wrong when executing multiplexes.
There are last-write wins problems in lexing/parsing timings in the Tracer, and query durations are inaccurate because we wait until Instrumentation.after_query to calculate them.
The queries being executed in the throwaway PR are:
test 'multiplex test' do
queries = [
{
query: 'query QueryOne { post(id: "42") { id title } }',
operation_name: 'QueryOne',
},
{
query: 'query QueryTwo { post(id: "42") { body } }',
operation_name: 'QueryTwo',
},
]
SchemaWithFullMetrics.multiplex(queries)
# ...
- Tracer.trace(execute_multiplex)
- Capturing pre_context.multiplex_start_time_* in pre_context
- NOTE: So far so good. This is captured once at the outset, and is stored in each query's context once it exists.
- Tracer.trace(lex)
- Capturing pre_context.lexing_* in pre_context
- Problem
#1: This value will be overwritten by any later documents in the multiplex which are lexed.
- Tracer.trace(parse)
- Capturing pre_context.parsing_* in pre_context
- Problem
#2: This value will be overwritten by any later documents in the multiplex which are parsed.
- Tracer.trace(validate)
- Capturing last pre_context.parsing_, lexing_ times in this query's context
- Capturing validation time in this query's context
- NOTE: This first query will correctly receive the lex and parse times in its context.
- NOTE: This query will correctly store its validation time.
- Tracer.trace(lex)
- Capturing pre_context.lexing_* in pre_context
- Problem
#1
- Tracer.trace(parse)
- Capturing pre_context.parsing_* in pre_context
- Problem
#2
- Tracer.trace(validate)
- Capturing last pre_context.parsing_, lexing_ times in this query's context
- Capturing validation time in this query's context
- Problems
#1: This nth query will store the first query's lex and parse times. - NOTE: This query will correctly store its validation time.
- Tracer.trace(analyze_multiplex)
- Here we clear the pre_context.
- Tracer.trace(analyze_query)
- (Current operation:
QueryOne) - Analyzer.on_leave_field - id
- Analyzer.on_leave_field - title
- Analyzer.on_leave_field - post
- Capturing analysis time in this query's context
- (Current operation:
- Tracer.trace(analyze_query)
- (Current operation:
QueryTwo) - Analyzer.on_leave_field - body
- Analyzer.on_leave_field - post
- Capturing analysis time in this query's context
- (Current operation:
- Tracer.trace(execute_query)
- Capturing execute time in this query's context
- Tracer.trace(authorized)
- Tracer.trace(execute_field)
- Capturing field execution time in this query's context
- Tracer.trace(authorized)
- Tracer.trace(execute_field)
- in Post.id resolver
- Capturing field execution time in this query's context
- Tracer.trace(execute_field)
- in Post.title resolver
- Capturing field execution time in this query's context
- Tracer.trace(execute_query)
- Capturing execute time in this query's context
- Tracer.trace(authorized)
- Tracer.trace(execute_field)
- Capturing field execution time in this query's context
- Tracer.trace(authorized)
- Tracer.trace(execute_field)
- in Post.body resolver
- Capturing field execution time in this query's context
- Tracer.trace(execute_query_lazy)
- NOTE: Unhandled.
- Instrumentation.after_query - query_duration extracted for current query
- Making use of the start time set when the
execute_querytracer event fired. - Problem
#3: Waiting untilInstrumentation.after_queryproduces inaccurate query execution times. We should calculatequery_durationin the Tracer instead, after theyieldincapture_query_start_time(and rename that method for accuracy).
- Making use of the start time set when the
- Instrumentation.after_query - query_duration extracted for current query
- Making use of the start time set when the
execute_querytracer event fired. - Problem
#3.
- Making use of the start time set when the
cc: @eapache @RobertWSaunders
This explanation makes sense to me, though it raises one other question: I consistently see execute_multiplex running after lex and parse, and you seem to consistently see it running before. What's going on here?
@eapache In those instances, are you passing .multiplex an already-parsed GraphQL::Query or a document string? This is my best guess as to why the order might differ.
This is in core, which instantiates a GraphQL::Query instance with the string, and then calls .result on it. AFAICT this intentionally lexes/parses before it enters the multiplexer: https://github.com/rmosolgo/graphql-ruby/blob/5c3a8ec6f45319f40cb59f2ffb5961bb1071aa80/lib/graphql/query.rb#L198-L200
This may be a genuine inconsistency within the graphql-ruby gem depending on the entrypoint you use?
Yes: https://github.com/rmosolgo/graphql-ruby/issues/3393
On second thought: Problems 1 and 2 might not actually be problems?
While it's true that the pre_context is not aware of mulitplexes (and multiple queries, contexts), we do run parse, lex and validate in that order, each time we run validate, we're using the values for parsing and lexing time that we last stored.
So long as graphql-ruby never runs something like parse (query 1), lex (query 1), parse (query 2), lex (query 2), validate (query 1), validate (query 2), and always does parse 1, lex 1, validate 1, parse n, lex n, validate n; we're ok?
Maybe the real problem is that if you pass a multiplex an unparsed document string and an already parsed query object (as one of our existing tests does), you don't always get a validate event that is preceded by a parse/lex events for the same query.
If that's true, maybe we just need to clear the pre_context after validate makes use of any values within?
tldr after sync:
- As long as
parse,lexandvalidaterun in that order (without interleaving across queries in a multiplex) Problems#1 and #2as described above are not real. -
Problem #3should be addressed by 1) still providing query execution time that includes the one-and-onlyexecute_query_lazycall, shared between queries in a multiplex, and 2) emit a separate metric per query that exclude lazy execution time, by recording an "end" time after the yield incapture_query_start_time(again, to be renamed appropriately). - We should make all of this more clear with method/variable renames, comments etc. in Tracer in particular, to ensure it's clear what/when/why is being handed off between
pre_contextand context. - We should clear the
pre_contextimmediately after obtaining values we need from it duringvalidate, so we don't have a problem where a mixture of pre-parsed and parsed queries get incorrect values invalidate, having assumed thatlex/parseran prior for the same query.