lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Adding profiling support for concurrent segment search

Open jainankitk opened this issue 8 months ago • 20 comments

Description

This code change introduces AbstractQueryProfilerBreakdown that can be extended by ConcurrentQueryProfilerBreakdown to show query profiling information for concurrent search executions

Issue

Relates to https://github.com/apache/lucene/issues/14375

jainankitk avatar Mar 27 '25 01:03 jainankitk

Can you explain why we need two impls? I would have assumed that the ConcurrentQueryProfilerBreakdown could also be used for searches that are not concurrent?

jpountz avatar Mar 27 '25 06:03 jpountz

Can you explain why we need two impls? I would have assumed that the ConcurrentQueryProfilerBreakdown could also be used for searches that are not concurrent?

ConcurrentQueryProfilerBreakdown maintains separate instance of QueryProfileBreakdown for each segment. In the context method, ConcurrentQueryProfilerBreakdown returns corresponding QueryProfileBreakdown object for each segment unlike DefaultQueryProfilerBreakdown that shares the same object across segments.

  @Override
  public QueryProfilerBreakdown context(LeafReaderContext context) {
    return this;
  }

Hence, I felt we don't need to take the overhead of creating breakdown per segment and then merging it together during response. That being said, eventually we can keep just ConcurrentQueryProfilerBreakdown if we prefer that for simplicity.

jainankitk avatar Mar 27 '25 16:03 jainankitk

Does it make sense to create a separate QueryProfilerBreakDown per leaf? Or should it create one per slice?

Can this be implemented as part of the ProfilerCollectorManager#newCollector logic? Maybe not, since we would also need to profile the work done by the Weight + Scorer on each slice.

msfroh avatar Mar 27 '25 20:03 msfroh

@jainankitk OK. In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really like non-concurrent search to look and feel like a concurrent search with a single slice running on a SameThreadExecutorService as far as profiling is concerned. So I wouldn't specialize the class hierarchy for concurrent vs. non-concurrent.

jpountz avatar Mar 27 '25 20:03 jpountz

Does it make sense to create a separate QueryProfilerBreakDown per leaf? Or should it create one per slice?

Actually, create one per slice makes lot of sense.

Can this be implemented as part of the ProfilerCollectorManager#newCollector logic? Maybe not, since we would also need to profile the work done by the Weight + Scorer on each slice

We can always use ThreadId for correctly tracking the mapping from slice to QueryProfilerBreakdown object.

jainankitk avatar Mar 27 '25 22:03 jainankitk

In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really like non-concurrent search to look and feel like a concurrent search with a single slice running on a SameThreadExecutorService as far as profiling is concerned.

Let me try and see if we can maintain per slice QueryProfilerBreakdown object. With that, both concurrent and non-concurrent paths would be consistent as well as efficient.

jainankitk avatar Mar 27 '25 22:03 jainankitk

One of the failing check is:

----------
1. ERROR in /home/runner/work/lucene/lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerBreakdown.java (at line 63)
	currentThreadId, ctx -> new QuerySliceProfilerBreakdown());
	                 ^^^
The value of the lambda parameter ctx is not used
----------
1 problem (1 error)

> Task :lucene:sandbox:ecjLintMain FAILED

I am wondering if there is a workaround for this? One option is to use putIfAbsent which doesn't require function as input, but then need to explicitly get before returning

jainankitk avatar Mar 28 '25 07:03 jainankitk

You just need to replace ctx with _.

jpountz avatar Mar 28 '25 13:03 jpountz

You just need to replace ctx with _.

Ah, my bad! I tried ., but we can't use that as part of variable name. Thanks for the suggestion @jpountz.

At a high level, I have unified the concurrent/non-concurrent profiling paths as suggested. The QueryProfilerTree is shared across slices, and we recursively build the ProfilerTree for each slice for response. There are few kinks that we still need to be iron out. For example:

  • Weight creation is global across slices. How do we account for its time? Should be have separate global tree with just the weight times? We can't just get away with having weight count at the top as Weight is shared for child queries as well, right?
  • The new in-memory structure for profiled queries is bit like below (notice additional list for slices):
"query": [ <-- for list of slices
               [ <-- for list of root queries
              {
                "type": "TermQuery",
                "description": "foo:bar",
                "time_in_nanos" : 11972972,
                "breakdown" :
                {

We can probably have map of slices, with key being the sliceId:

       "query": {
               "some global information":
               "slices": {
               "slice1": [ <-- for list of root queries
                {
                "type": "TermQuery",
                "description": "foo:bar",
                "time_in_nanos" : 11972972,
                "breakdown" :
                {...}}],
               "slice2": [],
               "slice3": []}
              }

jainankitk avatar Mar 28 '25 17:03 jainankitk

@jpountz - Can you provide your thoughts on above?

jainankitk avatar Mar 31 '25 21:03 jainankitk

I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for each slice. Related, it'd be nice if each per-slice object could also tell us about the thread that it ran in and its start time so that we could understand how exactly Lucene managed to parallelize the search.

jpountz avatar Apr 01 '25 13:04 jpountz

I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for each slice.

While working on the code, I realized it is better to have list of slices within the tree itself at each level, instead of repeating the query structure and information across multiple trees. In this approach, we can easily view the tree for specific sliceId using jq or simple script. The structure looks like below:

"query": [ <-- for list of root queries
              {
                "type": "TermQuery",
                "description": "foo:bar",
                "startTime" : 11972972,
                "totalTime": 354343,
                "breakdown" :   {.....}, <-- query level breakdown like weight count and time
                "sliceBreakdowns": [
                {.....},    <-- first slice information
                {.....}]    <-- second slice information
                "queryChildren": [
                {......},  <-- recursive repetition of above structure
                {......}]

Related, it'd be nice if each per-slice object could also tell us about the thread that it ran in and its start time so that we could understand how exactly Lucene managed to parallelize the search.

Yes, that would be really useful. I have included the threadId as sliceId, startTime and totalTime information for each slice object at every query level.

jainankitk avatar Apr 03 '25 06:04 jainankitk

@jpountz - The code changes are ready for review. For now, I have made changes to accommodate all the timers in QueryProfilerTimingType.

While this does not modify (rewrite(), CollectorManager#reduce) called out earlier as it is not part of QueryProfilerTimingType, it does lay the groundwork using createWeight(). We can take those changes in a followup PR.

jainankitk avatar Apr 03 '25 06:04 jainankitk

It looks like it's trying to aggregate data differently for slice-level operations vs. top-level operations, but we shouldn't do this anymore now that we're breaking down by thread?

IMO, QuerySliceProfilerBreakdown and sliceLevel in QueryProfilerTimingType just declare which metrics can be collected at sliceLevel vs queryLevel. The specific implementation of QueryProfileBreakdown we have, provides per thread aggregated view for these slices. Tomorrow if we have separate implementation of QueryProfileBreakdown, that does not aggregate slices together, that can continue to use the same specifications. Maybe we can add thin layer like below to make things more explicit and clear, not sure if it is needed though:

public class QueryThreadProfilerBreakdown extends QuerySliceProfilerBreakdown {}

jainankitk avatar Apr 07 '25 22:04 jainankitk

After giving some more thought, added new interface QueryLeafProfilerAggregator, implementation of which allows aggregating leaf level breakdowns using THREAD, SLICE or LEAF level. The profiler result has aggregationType field that indicates the level of aggregation, and what id in the aggregated result will correspond to. thread or slice or leaf

jainankitk avatar Apr 14 '25 02:04 jainankitk

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Apr 29 '25 00:04 github-actions[bot]

@jpountz - Thanks for the review earlier. Will really appreciate your feedback on the updates.

jainankitk avatar Apr 29 '25 07:04 jainankitk

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar May 14 '25 00:05 github-actions[bot]

I submitted talk on this topic (Profiling Concurrent Search in Lucene: A Deep Dive into Parallel Execution) for ASF conference (https://communityovercode.org/schedule/) and it was selected. Would love to iterate and get this PR merged before that!

jainankitk avatar May 27 '25 18:05 jainankitk

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Jun 11 '25 00:06 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Jun 28 '25 00:06 github-actions[bot]

Sorry, I had lost track of this PR. I think that my only concern left is how it computes end times by adding the approximate timing to the start time. I'd rather not report it since this profiler doesn't actually compute an end time, but I'd be fine with reporting the sum of approximate timings if we think that it helps.

No worries @jpountz, thanks for the review. Even I was away for a bit, will try to address the comments and push new revision this week.

jainankitk avatar Jul 21 '25 19:07 jainankitk

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Jul 28 '25 02:07 github-actions[bot]

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Jul 28 '25 02:07 github-actions[bot]

@jpountz - I have addressed all the comments from earlier review. Are you able to take another look, to help close this out?

jainankitk avatar Aug 05 '25 19:08 jainankitk

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Aug 20 '25 00:08 github-actions[bot]

Thanks all for reviewing this PR. Planning to merge this PR by tomorrow, if there is no new feedback. Again, thanks for helping improve this change with your inputs!

jainankitk avatar Aug 25 '25 21:08 jainankitk

Hi @jainankitk , is QueryProfilerResult no longer JSON (de)serializable? The reason is the introduction of AggregatedQueryLeafProfilerResult which contains Thread object. I don't see its Thread object is being used anywhere, can it be removed?

dungba88 avatar Oct 14 '25 01:10 dungba88

or if we do need to maintain the Thread association, can we store the Thread.id (an int I think) instead of the Thread?

msokolov avatar Oct 14 '25 17:10 msokolov

Thanks @dungba88 for trying this out.

I don't see its Thread object is being used anywhere, can it be removed?

The Thread object is used for maintaining the association (as @msokolov called out) during aggregation of segment level Profiler Result. I was not exactly sure what all fields from Thread might be needed/make sense for serialization, so kept Thread itself in AggregatedQueryLeafProfilerResult. I am wondering if it makes sense to add @JsonIgnore / transient to not serialize Thread (and avoid exception/errors) and instead have limited view of Thread, say SerializableThread containing all the properties that we want to have in the serialization?

jainankitk avatar Oct 15 '25 00:10 jainankitk