Adding profiling support for concurrent segment search
Description
This code change introduces AbstractQueryProfilerBreakdown that can be extended by ConcurrentQueryProfilerBreakdown to show query profiling information for concurrent search executions
Issue
Relates to https://github.com/apache/lucene/issues/14375
Can you explain why we need two impls? I would have assumed that the ConcurrentQueryProfilerBreakdown could also be used for searches that are not concurrent?
Can you explain why we need two impls? I would have assumed that the ConcurrentQueryProfilerBreakdown could also be used for searches that are not concurrent?
ConcurrentQueryProfilerBreakdown maintains separate instance of QueryProfileBreakdown for each segment. In the context method, ConcurrentQueryProfilerBreakdown returns corresponding QueryProfileBreakdown object for each segment unlike DefaultQueryProfilerBreakdown that shares the same object across segments.
@Override
public QueryProfilerBreakdown context(LeafReaderContext context) {
return this;
}
Hence, I felt we don't need to take the overhead of creating breakdown per segment and then merging it together during response. That being said, eventually we can keep just ConcurrentQueryProfilerBreakdown if we prefer that for simplicity.
Does it make sense to create a separate QueryProfilerBreakDown per leaf? Or should it create one per slice?
Can this be implemented as part of the ProfilerCollectorManager#newCollector logic? Maybe not, since we would also need to profile the work done by the Weight + Scorer on each slice.
@jainankitk OK. In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really like non-concurrent search to look and feel like a concurrent search with a single slice running on a SameThreadExecutorService as far as profiling is concerned. So I wouldn't specialize the class hierarchy for concurrent vs. non-concurrent.
Does it make sense to create a separate QueryProfilerBreakDown per leaf? Or should it create one per slice?
Actually, create one per slice makes lot of sense.
Can this be implemented as part of the ProfilerCollectorManager#newCollector logic? Maybe not, since we would also need to profile the work done by the Weight + Scorer on each slice
We can always use ThreadId for correctly tracking the mapping from slice to QueryProfilerBreakdown object.
In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really like non-concurrent search to look and feel like a concurrent search with a single slice running on a SameThreadExecutorService as far as profiling is concerned.
Let me try and see if we can maintain per slice QueryProfilerBreakdown object. With that, both concurrent and non-concurrent paths would be consistent as well as efficient.
One of the failing check is:
----------
1. ERROR in /home/runner/work/lucene/lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerBreakdown.java (at line 63)
currentThreadId, ctx -> new QuerySliceProfilerBreakdown());
^^^
The value of the lambda parameter ctx is not used
----------
1 problem (1 error)
> Task :lucene:sandbox:ecjLintMain FAILED
I am wondering if there is a workaround for this? One option is to use putIfAbsent which doesn't require function as input, but then need to explicitly get before returning
You just need to replace ctx with _.
You just need to replace ctx with _.
Ah, my bad! I tried ., but we can't use that as part of variable name. Thanks for the suggestion @jpountz.
At a high level, I have unified the concurrent/non-concurrent profiling paths as suggested. The QueryProfilerTree is shared across slices, and we recursively build the ProfilerTree for each slice for response. There are few kinks that we still need to be iron out. For example:
Weightcreation is global across slices. How do we account for its time? Should be have separate global tree with just the weight times? We can't just get away with having weight count at the top asWeightis shared for child queries as well, right?- The new in-memory structure for profiled queries is bit like below (notice additional list for slices):
"query": [ <-- for list of slices
[ <-- for list of root queries
{
"type": "TermQuery",
"description": "foo:bar",
"time_in_nanos" : 11972972,
"breakdown" :
{
We can probably have map of slices, with key being the sliceId:
"query": {
"some global information":
"slices": {
"slice1": [ <-- for list of root queries
{
"type": "TermQuery",
"description": "foo:bar",
"time_in_nanos" : 11972972,
"breakdown" :
{...}}],
"slice2": [],
"slice3": []}
}
@jpountz - Can you provide your thoughts on above?
I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for each slice. Related, it'd be nice if each per-slice object could also tell us about the thread that it ran in and its start time so that we could understand how exactly Lucene managed to parallelize the search.
I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for each slice.
While working on the code, I realized it is better to have list of slices within the tree itself at each level, instead of repeating the query structure and information across multiple trees. In this approach, we can easily view the tree for specific sliceId using jq or simple script. The structure looks like below:
"query": [ <-- for list of root queries
{
"type": "TermQuery",
"description": "foo:bar",
"startTime" : 11972972,
"totalTime": 354343,
"breakdown" : {.....}, <-- query level breakdown like weight count and time
"sliceBreakdowns": [
{.....}, <-- first slice information
{.....}] <-- second slice information
"queryChildren": [
{......}, <-- recursive repetition of above structure
{......}]
Related, it'd be nice if each per-slice object could also tell us about the thread that it ran in and its start time so that we could understand how exactly Lucene managed to parallelize the search.
Yes, that would be really useful. I have included the threadId as sliceId, startTime and totalTime information for each slice object at every query level.
@jpountz - The code changes are ready for review. For now, I have made changes to accommodate all the timers in QueryProfilerTimingType.
While this does not modify (rewrite(), CollectorManager#reduce) called out earlier as it is not part of QueryProfilerTimingType, it does lay the groundwork using createWeight(). We can take those changes in a followup PR.
It looks like it's trying to aggregate data differently for slice-level operations vs. top-level operations, but we shouldn't do this anymore now that we're breaking down by thread?
IMO, QuerySliceProfilerBreakdown and sliceLevel in QueryProfilerTimingType just declare which metrics can be collected at sliceLevel vs queryLevel. The specific implementation of QueryProfileBreakdown we have, provides per thread aggregated view for these slices. Tomorrow if we have separate implementation of QueryProfileBreakdown, that does not aggregate slices together, that can continue to use the same specifications. Maybe we can add thin layer like below to make things more explicit and clear, not sure if it is needed though:
public class QueryThreadProfilerBreakdown extends QuerySliceProfilerBreakdown {}
After giving some more thought, added new interface QueryLeafProfilerAggregator, implementation of which allows aggregating leaf level breakdowns using THREAD, SLICE or LEAF level. The profiler result has aggregationType field that indicates the level of aggregation, and what id in the aggregated result will correspond to. thread or slice or leaf
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
@jpountz - Thanks for the review earlier. Will really appreciate your feedback on the updates.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
I submitted talk on this topic (Profiling Concurrent Search in Lucene: A Deep Dive into Parallel Execution) for ASF conference (https://communityovercode.org/schedule/) and it was selected. Would love to iterate and get this PR merged before that!
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
Sorry, I had lost track of this PR. I think that my only concern left is how it computes end times by adding the approximate timing to the start time. I'd rather not report it since this profiler doesn't actually compute an end time, but I'd be fine with reporting the sum of approximate timings if we think that it helps.
No worries @jpountz, thanks for the review. Even I was away for a bit, will try to address the comments and push new revision this week.
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.
@jpountz - I have addressed all the comments from earlier review. Are you able to take another look, to help close this out?
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
Thanks all for reviewing this PR. Planning to merge this PR by tomorrow, if there is no new feedback. Again, thanks for helping improve this change with your inputs!
Hi @jainankitk , is QueryProfilerResult no longer JSON (de)serializable? The reason is the introduction of AggregatedQueryLeafProfilerResult which contains Thread object. I don't see its Thread object is being used anywhere, can it be removed?
or if we do need to maintain the Thread association, can we store the Thread.id (an int I think) instead of the Thread?
Thanks @dungba88 for trying this out.
I don't see its Thread object is being used anywhere, can it be removed?
The Thread object is used for maintaining the association (as @msokolov called out) during aggregation of segment level Profiler Result. I was not exactly sure what all fields from Thread might be needed/make sense for serialization, so kept Thread itself in AggregatedQueryLeafProfilerResult. I am wondering if it makes sense to add @JsonIgnore / transient to not serialize Thread (and avoid exception/errors) and instead have limited view of Thread, say SerializableThread containing all the properties that we want to have in the serialization?