OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[RFC] Profiling Extensibility

Open neuenfeldttj opened this issue 7 months ago • 8 comments

Is your feature request related to a problem? Please describe

Currently, the core profiler only supports query timing information for queries (times weight creation, scoring, matching, etc.). Two limitations arise from this: a user cannot add more information to the breakdown and a plugin cannot add any profiling information. It would be useful for any plugin that is used in the search process to be able to contribute profiling information, if it wants to provide telemetry on the search query within its plugin. Currently, there are no extension points for plugins to add profiling information. The first limitation arises from the fact that the code is too narrowly focused on basic query timing information by strictly using the QueryTimingType enum. Abstracting some of these classes further allows room to design solutions for plugins to add information. In addition to not being able to supply additional timing information, the current implementation does not allow non-timing information from being in the breakdown (e.g. size of result set after a filter). Extending the capability of the profiler would allow plugins to add useful profiling information.

Describe the solution you'd like

Context

This is the general design on the profiler and how it interacts with search:

Image

Profile Breakdown

Current Breakdown Structure

Image

Proposed Breakdown Structure

Image

This is the key component of the profiler. Each type of profiler has it’s own information it wants to display through use of a breakdown. Currently, it only supports timing information but abstracting it away allows for more than timing information:

public abstract class AbstractProfileBreakdown {

    /** Sole constructor. */
    public AbstractProfileBreakdown() {}
    
    /**
    * Gather important metrics for current instance
    */
    abstract public Map<String, Long> toImportantMetricsMap();

    /**
     * Build a breakdown for current instance
     */
    abstract public Map<String, Long> toBreakdownMap();

    /**
     * Fetch extra debugging information.
     */
    public Map<String, Object> toDebugMap() {
        return emptyMap();
    }
} 

Profile Timing Breakdown

Things like queries or something in a plugin might want to time specific information, so it would be nice to have another abstraction just for timing information where the timers are a map based on a enum specified for this breakdown:

/**
 * Base class for all timing profile breakdowns.
 */
public abstract class AbstractTimingProfileBreakdown extends AbstractProfileBreakdown {

    protected final Map<String, Timer> timers = new HashMap<>();

    public AbstractTimingProfileBreakdown() {}

    public Timer getTimer(String type) {
        ...
    }

    public long toNodeTime() {
        ...
    }

    /**
     * Build a timing count breakdown for current instance
     */
    @Override
    public Map<String, Long> toBreakdownMap() {
        ...
    }
}

Profile Tree

This is another important part of the profiler. It keeps track of each breakdown in the query tree by constructing its own tree. It does not need any change from the current implementation. Each profiler has their own subclass of a profile tree because it needs to return the specific breakdown instance of that profiler. These subclasses can also contain other information (e.g. timing the rewrite of a query). The tree keeps track of the breakdown instances along with the queries associated with the breakdown.

public abstract class AbstractInternalProfileTree<PB extends AbstractProfileBreakdown, E>

Profile Result

This gets created after the query is done and the entire ProfileTree is needed. Each breakdown gets converted to a result. In order to statically parse any ProfileResult, a new map importantMetrics had to be added to display this top-level information. Currently, time_in_nanos and other information is displayed in this:

public class ProfileResult implements Writeable, ToXContentObject {
    protected final String type;
    protected final String description;
    protected final Map<String, Long> importantMetrics;
    protected final Map<String, Long> breakdown;
    protected final Map<String, Object> debug;
    protected List<ProfileResult> children;
}

Profile Shard Result

A profile result gets converted into a profile shard result to be sent to other nodes. This should be abstracted in case plugins (or aggregations) don’t need to send additional shard-level information (rewrite time and collector info in the case of a query).

Profiler

This was already abstracted, but a profile shard result generic was added to make it cleaner to create a shard result based on the profiler.

public abstract class AbstractProfiler<PB extends AbstractProfileBreakdown, E, SR extends AbstractProfileShardResult> {
    ...
    public abstract SR createProfileShardResult();
    
}

Profilers

This is what actually gets created when the profile flag is specified. It has a list of QueryProfiler (starts off with one query profiler, but aggregation adds more to the list), and an AggregationProfiler . For the Solution 1 PoC, a list of AbstractProfiler for plugins was added. It uses the SearchPlugin to get the plugins’ profilers (if specified) and adds to the list. Importantly, this is where concurrency is specified when creating profilers. With solution 1 (just like the current code), all these classes have to have a concurrent subclass to specify what to do in the case of concurrency.

Proposed Solutions

The proposal has two high-level suggestions:

1) Single Profiler design

The SearchPlugin has a hook that accepts Class<AbstractTimingProfileBreakdown> . This means that the core side only has 1 profile tree structure (InternalQueryProfileTree ) and the query profiler maintains the reference to the plugin profile breakdown class. When the profile tree needs to create an instance, it will just instantiate it. The plugin would only have to create an AbstractProfileBreakdown subclass.

This would be the theoretical output of this design:

"profile" : {
    "shards" : [
      {
        "id" : "[pxAj3PB9QPqb4R-YcH-yMQ][test_index][0]",
        "inbound_network_time_in_millis" : 0,
        "outbound_network_time_in_millis" : 0,
        "searches" : [
          {
            "query" : [
              {
                "type" : "KNNQuery",
                "description" : "",
                "important_metrics" : {
                  "avg_slice_time_in_nanos" : 0,
                  "min_slice_time_in_nanos" : 123456,
                  "time_in_nanos" : 123456,
                  "max_slice_time_in_nanos" : 123456
                },
                "breakdown" : {
                  "avg_score_count" : 4,
                  "next_doc" : 2000625,
                  "score_count" : 19,
                  "score" : 1991375,
                  "max_next_doc_count" : 15,
                  "ann_search" : 19130750,
                  "search_leaf_count" : 10,
                  "bitset_creation" : 18868750,
                  /* Both plugin info and query info in same breakdown */
                  "build_scorer" : 20872417,
                  "avg_ann_search_count" : 2,
                  "create_weight" : 196542,
                  "avg_next_doc_count" : 7,
                  "ann_search_count" : 10,
                  "max_search_leaf_count" : 3,
                  "min_score_count" : 2
                }
              }
            ],
            "rewrite_time" : 12207,
            "collector" : [
              {
                ...
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }

2) Multi-profiler design

The SearchPlugin has a hook that accepts AbstractProfiler . Profilers maintains a list of profilers provided by any plugin. The final output consists of a separate tree structure for each plugin profiler. The plugin would have to implement many classes such as AbstractProfileBreakdown, AbstractProfiler , AbstractProfileTree , AbstractShardResult , etc. A query can be done concurrently and the profiler has separate classes (profile tree, profile breakdown, etc) to handle the profiling stats across slices. Since the plugin would have to implement its own profiler, it would also have to implement the concurrent versions of these classes. An enum would be used for timing, but is specific to each profiler (core would only have to worry about the basic query timing types).

This would be the theoretical output of this design:

"profile" : {
    "shards" : [
      {
        "id" : "[pxAj3PB9QPqb4R-YcH-yMQ][test_index][0]",
        "inbound_network_time_in_millis" : 0,
        "outbound_network_time_in_millis" : 0,
        "searches" : [
          {
            "query" : [
              {
                "type" : "KNNQuery",
                "description" : "",
                "important_metrics" : {
                  "avg_slice_time_in_nanos" : 0,
                  "min_slice_time_in_nanos" : 123456,
                  "time_in_nanos" : 123456,
                  "max_slice_time_in_nanos" : 123456
                },
                "breakdown" : {
                  "avg_score_count" : 4,
                  "next_doc" : 2000625,
                  "score_count" : 19,
                  "score" : 1991375,
                  "max_next_doc_count" : 15,
                  /* Query timing info shown here  */
                  "build_scorer" : 20872417,
                  "create_weight" : 196542,
                  "avg_next_doc_count" : 7,
                  "min_score_count" : 2
                }
              }
            ],
            "rewrite_time" : 12207,
            "collector" : [
              {
                ...
              }
            ]
          }
        ],
        "plugins" : [
          {
            "knn-query" : [
              {
                "type" : "KNNQuery",
                "description" : "",
                "breakdown" : {
                  "search_leaf_count" : 10,
                  "ann_search" : 58718123,
                  "bitset_creation" : 21792,
                  /* Plugin-specific info shown here */
                  "search_leaf" : 60492126,
                  "cardinality" : 0,
                  "exact_search" : 0
                }
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }


Related component

Search:Query Insights

Describe alternatives you've considered

Concurrency

Running a query non-concurrently means that a single profile breakdown is generated per LeafReaderContext and each segment belonging to that context gets sequentially processed and their metrics sequentially updated. However, when running concurrently, each segment gets split up into slices to run parallel. This means that there’s multiple LeafReaderContext s generated (one per slice). Therefore, once the profiling is done on each slice, we need to aggregate the profile results: each metric across each slice has the min/max/avg calculated providing concurrency information in the breakdown. In addition, each metric across each slice will be summed to get the final result for that particular query in the profile tree (what the non-concurrent execution would normally do). With Solution 1, this is quite easy because the plugin profile breakdown belongs to the query profile breakdown, so once the results need to get aggregated, it also just aggregates the plugin breakdown metrics. However, with Solution 2 making the plugin have its own profiler, the plugin needs to do all the programming for the aggregation of the plugin metrics. For Solution 2, some sort of abstracted class could be created that does the aggregation for a single metric. This would make it simpler for plugins using Solution 1 to create a concurrent profiler. Without this, the user would need to code each class concurrently: profiler, profile tree, profile breakdown.

Enum elimination

Enums are used for building timers and used to keep track of it which timer to start timing. Ultimately only QueryTimingType enum gets used inside the concurrent breakdown builder. This does not allow for plugins to add their own timers and have it displayed in the breakdown. Instead, it would be ideal to not have an enum structure in core and have the timing breakdown dynamically create timers and keep track based on maybe a string sent in. Then the concurrent breakdown wouldn’t have to be solely based on one specific enum and can iterate through all timers.

Additional context

https://github.com/opensearch-project/OpenSearch/pull/17146

neuenfeldttj avatar Jun 06 '25 18:06 neuenfeldttj

Thanks a lot for the RFC @neuenfeldttj , a few questions / points:

abstract public Map<String, Long> toImportantMetricsMap();

What are those important metrics? Why do we need them instead of bundling all the data into breakdown map? It looks a bit off to be fair.

  1. Single Profiler design vs 2) Multi-profiler design

I think from usability perspective, the core should take as much of a burden as possible from the plugin / extension developers:

  • any plugin could contribute one or more profilers (this choice probably should be contextual)
  • the core would instantiate the applicable ones during the query profiling
  • the core should provide the mechanism(s) for plugins / extensions to access their own profilers and contribute along the way
  • the core should take care of the aggregation at the end

Enum elimination

This is indeed one of the major limitations, I think this is very valid decision BUT it will be a breaking change. We should think a bit more about how we could approach that.

reta avatar Jun 08 '25 14:06 reta

What are those important metrics? Why do we need them instead of bundling all the data into breakdown map? It looks a bit off to be fair.

This could be named differently as it's a little confusing. For the query profiler, these "important metrics" are:

"avg_slice_time_in_nanos",
"min_slice_time_in_nanos",
"time_in_nanos",
"max_slice_time_in_nanos"

They are all calculated when generating the breakdown map and the current design places them outside the breakdown object. For a ProfileResult to be statically parsed we need to know everything placed outside the breakdown map. Some approaches:

  1. Keep everything inside the breakdown - if the breakdown is huge, it makes it difficult to find these stats
  2. Create another map of just these important stats - this allows plugins to add things outside the breakdown object if they want.
  3. Don't let plugins contribute outside the breakdown -- the query profiler will be the only one to have these metrics outside the breakdown object and plugins will just have to display it inside the profile breakdown.

Thoughts?

As for your second point, I agree. I feel like solution 1 takes away much of the burden for plugin developers.

neuenfeldttj avatar Jun 09 '25 15:06 neuenfeldttj

[Search Triage] @ansjcy FYI!

sandeshkr419 avatar Jun 11 '25 16:06 sandeshkr419

Thanks for the work here.

I'm thinking that at a high level can we simplify this by creating an extension point that allows for plugins to simply add their required telemetry to the breakdown map? For example:

public interface ProfileBreakdown {
    //Each ProfileBreakdown will implement a breakdown of what it wants to display 
    Map<String, String> toBreakdownMap();
}

This interface will be available to plugins to implement and extend with their own metric. For example with what we have currently with Timer:

public Timer implements ProfileBreakdown

The actual ProfileResult doesn't actually care about the implementation of the metrics themselves but rather their output. as it just needs the breakdown map to know what to display to the user.

Then in the search path we can access the map to retrieve the needed ProfileBreakdown and type cast it back whenever we need a specific instance of that metric. We should then be free to leverage actions like starting/stopping timers, incrementing count, etc.

With this we may be able to maintain whatever structure we have currently (maybe even simplify it a bit!). Obviously since we would be sharing this breakdown map with other developers we'd need to be able to uniquely identify the key that comes in through the extension point.

markwu-sde avatar Jun 11 '25 21:06 markwu-sde

Since we've integrated with OpenTelemetry spans/traces, I wonder if we might be able to just use those for profiling instead?

As called out, the current profiler is pretty rigid and inflexible. I'm wondering if it would be worthwhile to just replace it with something better.

msfroh avatar Jun 11 '25 23:06 msfroh

Since we've integrated with OpenTelemetry spans/traces, I wonder if we might be able to just use those for profiling instead?

I think those could be separate efforts, OpenTelemetry needs a whole lot of infrastructure whereas query profiles are on demand and very easy to get, with no additional cost to the user.

reta avatar Jun 11 '25 23:06 reta

@neuenfeldttj Thanks for the RFC!

  1. Multi-profiler design

Multi-profiler design seems to giving more flexibility at the cost of doing more heavy lifting. One of the things to consider is if there will be any inconsistencies if the plugin owners provide their own AbstractProfiler. Especially in the case of concurrent segment searches. If it deviates from what core (based on the plugin implementation) is doing currently, it can be misleading for the users. I am curious if there was a way for plugins to be able to reuse or extend the existing concurrent profile so there aren't many inconsistencies.

Another point I want to bring up here is related to user experience. Separate plugin metrics also has timing and may also be confusing for the users. But I am assuming with separate profiler there is no other option considering the rigidity of existing design

shatejas avatar Jun 13 '25 17:06 shatejas

Keep everything inside the breakdown - if the breakdown is huge, it makes it difficult to find these stats

I would prefer to keep everything in stats: yes, it will be larger (but it will get larger in any case), but at least very straightforward. It won't be any difficult to find the stats by name I believe (but one has to know what to look for, this is the documentation part).

reta avatar Jun 14 '25 12:06 reta