rally icon indicating copy to clipboard operation
rally copied to clipboard

Search profile telemetry device

Open nik9000 opened this issue 2 years ago • 3 comments

It's be sweet if I could add something like --telemetry search-profile and rally could invoke searches on additional time with profile: true and save the profile. I'm thinking something like:

  1. If the operation-type is search.
  2. After the warmup and measurement phases
  3. Rerun the search one final time, adding profile: true to the top level of the search
  4. Save the results to the telemetry cluster

Maybe we could add them to the output somehow but I'm not super sure how. The profile output is pretty free form and you really just need to dig into it with jq most of the time to grab things. And we're not running it on every execution so the perf numbers that come out aren't super useful. Mostly we're looking for stuff like "what queries were actually executed?" "what implementation decisions did the aggregation make?" "how many segments got to use optimal paths?" "did the fetch phase get to take optimal paths?"

nik9000 avatar May 06 '22 14:05 nik9000

@tmgordeeva, @salvatore-campagna, and I talked about doing this by hand a few days ago. And @DJRickyB and I just talked about wanting to get this information from a nightly. So maybe it's worth building something to do this.

nik9000 avatar May 06 '22 14:05 nik9000

The feature as you've described it is not supported by the current structure of Rally

  • Most pedantically, I don't think this would be a telemetry device as we've never coupled telemetry devices with specific operations before
  • And-one execution may be doable but running this via the current mechanism would mean at best sticking this in a new iteration type (profile in addition to warmup and normal) and nesting the profile in the meta field of the metric object.

How about either of these:

  • We influence global/local profiling via --profile=true in the CLI or "profile": true at the task-level, exactly as we have --on-error=abort or "ignore-response-error-level": "non-fatal". This is not meant for true measurements but will generate a profile for each configured iteration, and stick it in the metric store in the meta field as described above
  • We add a new profile sub-command which (like --test-mode) runs queries a limited number of times (once?) BUT outputs a JSON file where the top-level keys are the tasks (which we already enforce uniqueness on) and the values are the response objects from the search type tasks, including with profile: true in the request. We'd still support track parameters (to influence things like ingest_percentage) and --include-tasks and --exclude-tasks

I think I like the second one better, and it doesn't involve retrieval from a larger document in the metrics store.

DJRickyB avatar May 06 '22 20:05 DJRickyB

I think it's useful to get the profiling against a fully loaded data set that's "hot" from a benchmark run. It's a little more "real".

I do like the idea of dumping this to a json file locally, though it could be useful to get the profile results from last night's benchmark run, so it might not make sense to just write them to a json file. For what it's worth I tend to use jq on the results of running the profiler to dig out the interesting pieces. Having the whole thing is nice, but I often have to tabularize it with jq and then dig further. So a json file would be wonderful.

I don't really care about whether or not this is a telemetry device. I guess to me it feels like a telementry device because it gets extra data about the run. But regular telemetry devices don't work that way, for sure. Also! It might disrupt the benchmark. It's rare, but possible for profile: true to convince the jvm that a particular call is megamorphic. If that's on the hot path it can slow down queries. It shouldn't, but computers are fun!

nik9000 avatar May 09 '22 12:05 nik9000