OpenSearch
OpenSearch copied to clipboard
Server side Cancellation of in-flight search requests based on resource consumption
Is your feature request related to a problem? Please describe. #1042 aims to build back-pressure support for Search requests. This will help in recovering a node which is running short on system resources and the already running search requests are not finishing and making things worse.
Describe the solution you'd like
Cancelling on-going most resource intensive search requests on a shard/node, if the resource limits for that shard/node have started breaching the assigned limits (https://github.com/opensearch-project/OpenSearch/issues/1180), and there is no recovery for a certain time threshold. The back-pressure model should support identification of queries which are most resource guzzling with minimal wasteful work. These can then be cancelled for recovering a node under load and continue doing useful work.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context https://github.com/opensearch-project/OpenSearch/issues/1180 - This issue covers rejection of incoming requests.
Following are some detailed thoughts on proposed approach:
Problem Statement
Many times, a single search query which is resource intensive can guzzle a lot of resources and a bunch of such queries can degrade the performance of the cluster. Currently we do not have a mechanism to identify and terminate the problematic queries when a node is in duress. Existing mechanisms like circuit breaker, thread pool size threshold act as a blanket mechanism and does not specifically target the problematic queries alone.
Goals
Milestone 1: Identify and reject the on-going resource intensive tasks on a shard/node if they have breached limits and does not recover within a certain threshold. It only rejects the task on a particular shard and other shard tasks can still execute successfully. Milestone 2: In previous milestone, we are only tracking shard level tasks. Now, include tracking for co-ordinator task resource consumption and cancellation logic based on the threshold. Milestone 3: Build aggregated view of resource consumption for a query by rolling up the consumption stats from all the shards and aggregating them under the parent task id. This aggregated view can be used to build guard-rails to track and cancel the request which consumes lots of resources across nodes.
Non-Goals
We are not targeting to build backpressure for spikes in search request rate as a part of this task. It would be handled as a part of rejection of incoming requests (https://github.com/opensearch-project/OpenSearch/issues/1180) task.
Key Considerations
- Resource Tracking Framework would be utilized to provides stats on task resource consumption.
- Initially, we are targeting the shard search task and not the co-ordinator task.
Proposed Approach
[Following sections describe the approach for Milestone 1]
Measure the resource consumption (CPU, Heap Memory) at frequent checkpoints within query phase of shard search request. If the node is in duress (JVM MP above threshold, CPU Utilization reached threshold) and if the total heap memory occupied by search shard tasks is >= 5% of total heap, then check the following criteria for each Search Task — CPU cycles spent, heap memory occupied by the task. If the task has been exceeded CPU cycles threshold and is among the top tasks based on heap memory occupied with huge variance from average resource consumption, then we will cancel the search task.
Different checkpoints to consider in Query Phase
Approach 1 - Using multiple checkpoints to track within the same task thread: Query phase in each search shard task undergoes different sub-phases (Pre-aggregation, Search, Rescoring, Suggest, Post-aggregation) and we will be checkpointing after each phase is completed. Among these, search phase does the actual lucene search and is very intensive. Hence we cannot checkpoint only after the search is completed and we will add a cancellable callback which would periodically checkpoint during actual search execution itself.
Approach 2 - Using separate observer thread We will be using separate observer thread to monitor the tasks at a fixed frequency. We will not evaluate through different checkpoints, but track tasks at a fixed frequency.
Deciding if node is in duress
Current JVM MP on the node and CPU utilization are used as criteria to determine if the node is in duress.
Identifying top resource consuming tasks
When TaskResourceTrackingService measures the resource stats, it will also keep track of top-N tasks based on the heap memory consumption. This would be used to identify and cancel the top resource intensive tasks if the variance is considerably higher.
Why not add cancellation logic in Fetch phase also?
Every search request goes through two phases - Query and Fetch phase. Query phase is responsible for doing the actual search and get the matching document ids from each shard. Fetch phase enriches the document ids with document information. Query phase is usually very heavy and resource consumption varies depending upon the nature of the query and the workload and hence we track query phase extensively. Also, once query phase is completed, search query is about to get finished and we do not want to cancel it which would result in wastage of the resource that executed till now.
PoC Testing
Did code changes for PoC Testing which included following logic — Heap to track the top-N requests, measure resource utilization after every sub-phase, cancel the top most resource consuming query. (Did not include logic for duration of running request, variance logic)
Executed two different types of queries - Light and heavy as follows:
Light weight query:
curl "localhost:9200/_search?q=*:*"
Comparatively heavy aggregation query:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json'
-d'{"query": {"range":{"dropoff_datetime":{"from":"01/01/2015","to":"21/01/2015",
"include_lower":true,"include_upper":true,"format":"dd/MM/yyyy","boost":1.0}}},
"aggregations":{"dropoffs_over_time":{"date_histogram":{"field":"dropoff_datetime",
"calendar_interval":"day","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0}}}}'
While the queries were getting executed, the top queries consuming lot of heap were getting cancelled as below, whereas the light-weight queries were always successful:
[2022-07-28T18:29:08,400][INFO ][o.o.s.q.QueryPhase ] [ip-172-31-51-111.us-west-2.compute.internal] This task 1317 is currently the highest resource consuming task and it is being cancelled
[2022-07-28T18:29:08,401][TRACE][o.o.s.SearchService ] [ip-172-31-51-111.us-west-2.compute.internal] Query phase failed
org.opensearch.tasks.TaskCancelledException: Task is cancelled due to high resource consumption
at org.opensearch.search.query.QueryPhase.measureResourceConsumption(QueryPhase.java:157) ~[opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:187) ~[opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:455) ~[opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:523) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:490) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.1.0.jar:2.1.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[ec2-user@ip-172-31-51-111 opensearch]$ curl -X GET "localhost:9200/_search?pretty"
-H 'Content-Type: application/json'
-d'{"query": {"range":{"dropoff_datetime":{"from":"01/01/2015","to":"21/01/2015",
"include_lower":true,"include_upper":true,"format":"dd/MM/yyyy","boost":1.0}}},
"aggregations":{"dropoffs_over_time":{"date_histogram":{"field":"dropoff_datetime",
"calendar_interval":"day","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0}}}}'
{
"error" : {
"root_cause" : [
{
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "nyc_taxis",
"node" : "MGkMkg9wREW3IVewZ7U_jw",
"reason" : {
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
}
]
},
"status" : 500
}
Other Approaches considered
- Measure resource consumption and add cancellation logic at end of each phase/sub-phases.
- As mentioned above, Query Phase does all the heavy-lifting and measuring at the end of the phase is too late to cancel the request.
- Cancelling based on fixed thresholds for per-request resource consumption.
- Each workload is different based on index and hardware configuration and fixed thresholds for per-request resource consumption might not be apt for all diverse workloads.
Points to Note
- [Milestone 1] If the search task is cancelled, it should be retried on another replica.
- [Milestone 1] Should the tracking of top-N requests can be moved to shard level, instead of node level? — Tracking at shard level for top-N might be over-complicating and does not add much value. Instead, we can just track the total heap memory occupied by search requests at shard level alone.
- [Milestone 3] We can also expose configuration for users to set a threshold for resource consumption beyond which the request will be cancelled.
- [Milestone 3] Cancelled queries can be added to slow logs, so that it is easier for users to debug.
Open Items
- From where can we measure the resource consumption for currently executing tasks? Should it be done on a separate observer thread? Or, can it happen on the same tasks’ thread?
- Separate observer thread: Using a separate observer thread to checkpoint resource consumption periodically at a fixed frequency. This would avoid any performance impact due to measuring resource consumption since it runs on a separate thread. It also provides improved code maintainability, since it does not require code interleaving within search logic.
- Using same task thread: We can measure resource consumption of task within the same task itself at different milestones. Additional overhead of measuring resource consumption needs to be handled by same task thread.
I will evaluate more on both the approaches and more details.
Metrics to be added
Following would be rolling window metric for every 1 minute and would be exposed through stats API.
-
ExpensiveTaskCancelledCount
- Counter which indicates the number of expensive tasks that are cancelled. -
TopNTaskCPUUtilization
- Absolute value of CPU utilization for all top-N requests for every minute. -
TopNTaskHeapMem
- Absolute value of Heap Mem utilization for all top-N requests for every minute.
Additional changes
- Shard level tracking of resource consumption — Resource consumption is currently measured per task and we do not build a shard-specific view to see how much heap memory is occupied by each shard. Similar to Indexing Backpressure (https://github.com/opensearch-project/OpenSearch/issues/478), we can cancel the requests based on the pressure on a shard.
Please share your thoughts on the above.
cc-ing folks for comments @reta @dblock @andrross @nknize @sruti1312 @getsaurabh02
More thoughts on the above open item:
Approach for Resource Consumption Tracking
We need to track the resource consumption for the currently executing tasks which can be done in two ways — Track at different milestones within the same task or use a separate observer thread for monitoring.- Use same task thread to monitor at different milestones — Within each task thread, at different milestones of query phase execution we will periodically measure the resource consumption by the same task and check if it has exceeded the threshold. If yes, task will throw rejected execution exception.
- Using separate observer thread — At a fixed frequency (every sec), separate observer thread will be tracking the resource consumption of each task and will cancel the task if it breaches the threshold.
 | Same task thread, milestone based monitoring | Separate observer thread |  |
---|---|---|---|
Performance overhead | Comparitively higher, because each search task does additional work to track itself at different milestones. | Lower because a separate thread does the tracking and search task thread does not do much additional work. | Â |
Accuracy | More accurate since query phase cancellable listener gets executed multiple times within a second. But in case if a search request is stuck in a phase, we would not be able to track it until it reaches next checkpoint. | Less accurate since it would track the consumption of each thread depends on the frequency of execution.Also more number of tasks would mean that tracking for each task would become less frequent. | Â |
Code Maintainability | Creates slightly additional headache because the query phase logic is interleaved with monitoring logic. Any refactoring needs to take care of both these factors. | Observer thread logic is totally indepdendent of search. Hence does not add to maintainence overhead. | Â |
Preferred approach - Separate observer thread is preferred due to its low performance overhead, simplicity and code maintainability. Although tracking using cancellable listener would enable tracking more closely (multiple invocations within a second), we do not gain much advantage by tracking at less than a second frequency. Also, if a task gets stuck it cannot be identified until it reaches the next checkpoint.
Implementation Details
A separate observer thread will be running which will execute the following logic every 1 second.- Consider the node is in duress if one of the below thresholds are met:
- JVM MP on the node (> 70%) for 3 minutes
- CPU utilization on the node (> 80%) for 3 minutes
- Following values are continuously tracked:
resourceConsumptionCompletedTasksMovingAvg
 — Measures the moving average of the resource consumption of the tasks once the task completed. It maintains a window of last 100 tasks. This value would be maintained and exposed byÂTaskResourceTrackingService
.resourceConsumptionCurrentTasksAvg
 — Measures the average resource consumption of the currently executing query phase tasks at that point of time.- If node is in duress:
- Check if any of the currently executing tasks have exceeded the configurable limits ofÂ
ShardSearchTaskMemoryThreshold
 orÂShardSearchCPUCyclesThreshold
. If yes, cancel the task. - Check if at least 5% of total heap is occupied by currently executing search tasks or if 10% CPU contributed by search tasks. If not, return. [This implies that search requests are not contributing to the heap memory pressure]
- Get the top 10 tasks based on resource consumption and execute the following steps:
- Get the max of the two values above and compare the it with each of the top 10 tasks. If the variance is higher than the threshold (
1.5x
 if it isÂcompletedTasksAvg
 orÂ2x
 if it isÂcurrentTasksAvg
) then cancel the task.
resourceConsumptionCurrentTasksAvg
 would be much lower and we may think of any task is nearing completion as rogue query. Hence we are taking into account both completed tasks average and currently executing tasks average.Configurable Settings
All the below settings would be dynamically configurable.- JVM MP Threshold (Default 70%)
- CPU Utilization Threshold (Default 80%)
- CompletedTasksVarianceThreshold (Default 1.5)
- RunningTasksVarianceThreshold (Default 2.0)
- ShardSearchTaskMemoryThreshold (Default 1% of total heap)
- ShardSearchCPUCyclesThreshold (Default 10 secs)
PoC Test Results
Did sample runs on the nyc_taxis workload with both the approaches. Please find the comparison test results here.Also find the comparison with base run for both approaches here: compare with same thread, compare with observer
- We could see that using separate observer thread has lesser performance overhead.
- When the workload had mix of multiple query workloads, the heavy workload queries like aggregation queries had a few task rejections and the lightweight queries continued to execute and did not face any errors.
- Due to GC running, could observe huge fluctuation in JVM MP. Should we consider subsequent datapoints to decide if JVM is under pressure?
Few possible scenarios and how they are handled
- Cluster receives a heterogenous mix of queries and also has some rogue queries which consumes lot of resources. Rogue queries are expected to get cancelled.
-
resourceConsumptionCompletedTasksMovingAvg
tracks the average consumption of completed tasks. If the rogue queries spike beyond the variance threshold, they will be cancelled.
-
- Cluster initially receives light-weight search requests alone. After more than 100 requests, heavy-workload requests start flowing. We should not be cancelling all heavy-workload tasks.
-
resourceConsumptionCompletedTasksMovingAvg
will be low due to average from light-weight requests, butresourceConsumptionCurrentTasksAvg
would be higher due to heavy workload. We will compare the max of these two and useresourceConsumptionCurrentTasksAvg
to measure variance of each task. Hence heavy-workload request would not get cancelled unless any of the task is consuming 2x more than the average among heavy-workload requests.
-
- Cluster receives an almost homogenous search workload. But some tasks might be starting just now and some may be nearing completion. Naturaly, the tasks nearing completion might have more resource consumption, but they should not be mistaken to be resource-intensive and cancelled.
-
resourceConsumptionCurrentTasksAvg
would be lesser, butresourceConsumptionCompletedTasksMovingAvg
would be higher since it tracks the resource consumption at completion. Since we would be comparing the variance of each task withresourceConsumptionCompletedTasksMovingAvg
, even tasks which have just started would not get cancelled.
-
- Cluster initially receives all heavy requests. After sometime, it starts receiving only the light requests.
- Completed Task moving average would be higher, but current task average is lesser. We will compare the max of these two values and we will still consider the completed tasks average which would not be cancelling any light-weight tasks.
- Cluster receives a good mix of light and heavy search queries. We should be cancelling only the high resource intensive queries if any and no other queries should be impacted.
- Completed Task Avg would be higher due to heavy search queries. Unless any of the task has a huge variance from the average, no task would get cancelled.
Thanks @nssuresh2007
Preferred approach - Separate observer thread is preferred due to its low performance overhead, simplicity and code maintainability.
+1 to that, collecting metrics in separate thread makes a lot of sense.
A separate observer thread will be running which will execute the following logic every 1 second.
On a general note, I think the cancellation logic should be smarter than just making the decision based on collected metrics. As an example, there could only one heavy query (scattered across many tasks) which could keep the cluster busy for a long time. Yes, it is probably not good overall, but if cluster is able to fulfill it (w/o timeout), it is not as bad as it looks. Consulting the picture at large - what cluster is doing - is probably necessary, otherwise such queries will be always cancelled, despite cluster doing nothing, no matter which replica they are retried on.
Also, the cancellation should take the age of the search request into account, it is highly likely that serving recent requests is more valuable, since the client may already gave up waiting for the older ones. This is somewhat related to some tasks might be starting just now and some may be nearing completion
because by and large, it is not possible to tell if the task is nearing the completion (without heavy instrumentation of the steps being executed), the heuristic in this case is the age of the task / search request. If timeout is configured, this is yet another mechanism which could impact the cancellation decision vs rejecting new search requests: otherwise there could be long running queries which will have no chances to complete despite the desire of the user to wait.
Now, the most important question to me is: what should the user do when the search request execution is cancelled with "Task is cancelled due to high resource consumption"
? She will probably retry it (may be even several times), is that what we want? Out of all, heap consumption is probably the most critical indicator of a coming trouble - should we recommend increasing the heap size? In general, we need to provide meaningful advice what the user should do to succeed.
I think one of the missed features, which could enormously help in decision making, is the search query cost estimation: taking the search query and cluster configuration + topology, estimate the resource consumption (costs). This is obviously out of scope of this particular issue.
There's a lot of great stuff in this proposal.
A complement to a resource consumption cancellation could be a quality of service-based evaluation. Meaning that instead of detecting the increase in resource consumption as a red flag (rightfully noted as potentially not a problem), we would attempt to detect that the cluster is deteriorating in its ability to provide a certain quality of service, which would cause the limits of what is acceptable resource consumption to be lowered, and more heavy requests to be cancelled.
This is because I want consistency in my service. If a cluster serves X requests successfully with Y quality (time, resources), I want to ensure that if I see X + 1 requests, that 1 addition does not produce less than Y quality overall for each of X requests, and would prefer reject or cancel that 1 extra request before quality degrades.
So, since I think we all want to see a "Task was cancelled before it took the cluster down" error message, consider the alternative or complementary QoS measure to the above proposals of adjusting thresholds. Having moving averages as proposed above is possibly not the right metric. What we will want to look at will be percentiles of operations succeeding within a certain threshold.
Thanks a lot for the feedback and your time @reta and @dblock!
Few points I wanted to add @reta:
Consulting the picture at large - what cluster is doing - is probably necessary, otherwise such queries will be always cancelled, despite cluster doing nothing, no matter which replica they are retried on.
Decision to trigger task cancellation is considered only if the node is in duress and the search tasks have contributed a significant portion to it. Hence such queries would not get cancelled on a normally operating cluster.
Also, the cancellation should take the age of the search request into account, it is highly likely that serving recent requests is more valuable, since the client may already gave up waiting for the older ones.
Yes, it is a very valid point. We can use the elapsed time for each task to prioritize older requests to be cancelled.
Now, the most important question to me is: what should the user do when the search request execution is cancelled with "Task is cancelled due to high resource consumption"? She will probably retry it (may be even several times), is that what we want?
We expect the client behavior to be similar to when ESRejectedExecutionException is thrown by the cluster. It would mean that cluster is overloaded and expect customer to retry with sufficient backoff. In case if partial results are allowed, we would return results only from other shards where tasks were not cancelled.
Should we recommend increasing the heap size?
Since it depends on the workload (indexing and search), recommendation to increase heap size might not be applicable always. Please let me know your thoughts.
Just a few follow-up thoughts on your comments @dblock:
A complement to a resource consumption cancellation could be a quality of service-based evaluation. So, since I think we all want to see a "Task was cancelled before it took the cluster down" error message, consider the alternative or complementary QoS measure to the above proposals of adjusting thresholds.
There are multiple parts that we want to build as a part of Search Back-pressure as mentioned here: https://github.com/opensearch-project/OpenSearch/issues/1042 (1) Recovering a node under duress - Once a node has gone into duress, we will identify and cancel most resource guzzling tasks to prevent node meltdown and recover. (2) Back-pressure for node to node resiliency - Track the heap memory consumption at shard level, build co-ordinator level view of consumption for each search request and take decisions on whether we can accept or reject the request. (3) Manage resource utilization effectively across the cluster - Estimate query cost for each incoming query and check if sufficient resources are available before admitting the request.
We are targeting to only recover a node in duress with task (1) above by cancelling resource guzzling tasks. Unlike Indexing operation, resource utilization by search request is hard to estimate since it depends on multiple factors like query type, its constructs (contexts and clauses), aggregation type, number of buckets, cardinality/fan-out (number of shard to search) and documents to be returned as part of the response. Hence while working on (3), we will build query estimation and will check if QoS has deteriorated or not and take decisions accordingly. This issue is targeting only task (1) above where we are only reactively trying to recover the node after it has gone to distress. Please let me know if that makes sense.
Having moving averages as proposed above is possibly not the right metric.
We are trying to identify most resource guzzling task among the currently executing ones by checking the variance of resource consumption of each task from the average. In order to ensure that any task nearing completion is not mistaken to be rogue task when compared to tasks that have just started execution, we use moving average of tasks resource completion as a reference for minimal value.
Please let me know your thoughts.
Agree with @dblock on the QoS, which is also the eventual goal once we are able to add more granular instrumentation on the latency breakdown across various layers and N/W interactions. This is also pretty challenging since the QoS could degrade not just because of an overload but also due to I/O slowdown. Some of the work to track gray failures is being taken up as a part of #4244. Once we have that level of details, we could maybe take the right action more deterministically.
I think the current proposal lays down steps to prevent a cluster from getting into an unmanageable state by applying load shedding mechanisms, allowing the cluster to recover
@tushar-kharbanda72 : As today Sep 07, is the code freeze date for OpenSearch. If this is not planned for 2.3, can you please update the label accordingly.
@reta @dblock @Bukhtawar @sachinpkale Would like to have your inputs on the metrics that would be added to the stats API as a part of this change. The purpose of these metrics would be to give the user some idea on why requests are rejected or not and what is the current task resource consumption looks like.
Following are the additional metadata that would be added to the stats API
"search_backpressure": {
"stats": {
"node_stats": {
"<node_id_1>": {
"current_max_search_task_heap_memory_consumed": 0,
"current_avg_search_task_heap_memory_consumed": 0,
"current_max_search_task_cpu_time_consumed": 0,
"current_avg_search_task_cpu_time_consumed": 0,
"current_total_search_task_heap_memory_consumed": 0
},
"<node_id_2>": {
"current_max_search_task_heap_memory_consumed": 0,
"current_avg_search_task_heap_memory_consumed": 0,
"current_max_search_task_cpu_time_consumed": 0,
"current_avg_search_task_cpu_time_consumed": 0,
"current_total_search_task_heap_memory_consumed": 0
}
},
"cancellation_stats": {
"<node_id_1>": {
"search_task_cancellation_count": 0,
"last_cancelled_task_memory_consumed": 0,
"last_cancelled_task_cpu_consumed": 0
},
"<node_id_2>": {
"search_task_cancellation_count": 0,
"last_cancelled_task_memory_consumed": 0,
"last_cancelled_task_cpu_consumed": 0
}
}
},
"limits": {
"search_task_memory_limit_bytes": 0,
"search_task_cpu_time_limit": 0
}
"enabled": true,
"enforced": false
}
cancellation_stats
- Stats from the time OpenSearch process came up.
-
search_task_cancellation_count
— Count of task cancellations done on that node till now from the process started. -
last_cancelled_task_memory_consumed
— Heap memory consumption in bytes for the most recent cancelled task. -
last_cancelled_task_cpu_consumed
— CPU cycles spent in nano secs for the most recent cancelled task.
node_stats
- Current stats at that point of time.
-
current_max_search_task_heap_memory_consumed
— Maximum of heap consumption in bytes per task of currently running tasks on that node. -
current_avg_search_task_heap_memory_consumed
— Average of heap consumption in bytes per task of currently running tasks on that node. -
current_avg_search_task_cpu_time_consumed
— Average of CPU cycles in nano secs per task of currently running tasks on that node. -
current_max_search_task_cpu_time_consumed
— Maximum CPU cycles in nano secs spent by a task among the currently running tasks on that node. -
current_total_search_task_heap_memory_consumed
— Current total heap memory in bytes consumed by all running tasks.
enabled
— Whether search backpressure is enabled or not.
enforced
— If set to true, would cancel tasks based on the criteria. If false, emits only logs and metrics which is useful for Shadow mode.
Kindly let me know your comments on the above.
I think the shape of this response doesn't align with other APIs, but I could be wrong
- is
search_task
incurrent_max_search_task_heap_memory_consumed
redundant, looks like other APIs would call thiscurrent_max_heap_memory_consumed
- in the stats doc I see
_in_metric
everywhere, so should it becurrent_max_heap_memory_consumed_in_bytes
? - existing APIs seem to have some logical grouping (e.g. "limit" vs. "current")
@dblock, thanks a lot for your feedback.
I have updated the structure as follows to address your comments and also have revamped the structure to make it easily extensible for future needs (i.e. currently we only emit stats on shard search task, in future we may also add stats on co-ordinator tasks).
"search_backpressure": {
"current_stats": {
"search_shard_task": {
"heap_memory_consumed_bytes": {
"current_avg": 0,
"current_max": 0,
"rolling_avg": 0
},
"cpu_time_consumed_nanos": {
"current_max": 0,
"current_avg": 0
},
"elapsed_time_nanos": {
"current_max": 0,
"current_avg": 0
}
}
},
"cancellation_stats": {
"search_shard_task": {
"cancellation_count": 0,
"cancellation_breakup": {
"heap_memory_limits": 0,
"cpu_cycle_limits": 0,
"elapsed_time_limits": 0
},
"cancellation_limit_reached_count": 0,
"last_cancelled_task": {
"memory_consumed_bytes": 0,
"cpu_consumed_nanos": 0,
"elapsed_time_nanos": 0
}
}
},
"enabled": true,
"enforced": true
}
-
current_stats
- Section which would contain the stats at that point of time. -
cancellation_stats
- Section which would contain the stats on the cancellation of tasks from the start-up time of the process. -
task_cancellation_count
— Count of task cancellations done on that node till now from the time process started. -
last_cancelled_task
— Section which contains the resource consumption stats on the most recent cancelled task. -
search_shard_task
— Section which would contain the stats specific to search shard task. -
heap_memory_consumed_bytes
— Section containing the stats on heap memory consumed by the task in bytes. -
cpu_time_consumed_nanos
— Section containing stats on CPU cycles consumed by the task in nano secs. -
elapsed_time_nanos
— Section containing stats on elapsed time by the task in nano secs. -
current_max
- Maximum value of resource consumption among currently executing tasks. (Applicable forheap_memory_consumed
,cpu_time_consumed
,elapsed_time
) -
current_avg
- Average value of resource consumption among currently executing tasks. (Applicable forheap_memory_consumed
,cpu_time_consumed
,elapsed_time
) -
rolling_avg
- Average of task heap memory consumption of most recent tasks within the rolling window. (Applicable forheap_memory_consumed
section alone) -
enabled
— Whether search backpressure is enabled or not. -
enforced
— If set to true, would cancel tasks based on the criteria. If false, emits only logs and metrics which is useful for Shadow mode.
Metrics behavior in Shadow vs Enforced mode
In Enforced mode, all the stats present in the response would be populated. But in Shadow mode, all stats under cancellation_stats section would not be populated and value would always be zero, since actual task cancellation does not happen. Values under current_stats would still be relevant and correctly populated in shadow mode as well.
Response to your comments:
is search_task in current_max_search_task_heap_memory_consumed redundant, looks like other APIs would call this current_max_heap_memory_consumed
Removed the redundant "search_task" text in every field.
in the stats doc I see _in_metric everywhere, so should it be current_max_heap_memory_consumed_in_bytes?
Updated.
existing APIs seem to have some logical grouping (e.g. "limit" vs. "current")
Removed the limit section since they are already listed as a part of cluster settings API and not duplicating them here again.
Kindly let me know your thoughts.
I like this better!
I really like this proposal! I have a couple questions/comments:
- I think we should aim to have milestone 2 done before we mark this as production ready. Milestone 1 looks like it might leave too many on going tasks across shards even if one search on a shard is cancelled, and so it is difficult to reconcile what the user would expect (partial results, query failure, or something else).
- Is the task cancellation manual via an API, automatic, or both? I think we would want both to be available. Automatic will prevent cluster outages, but as an administrator, one might want the ability to cancel a long running query independent of cluster health in order to avoid that query from taking up resources that they would rather have allocated to other queries.
- I agree with @dblock that we should be also looking at the QoS based approach alongside the one we have documented here.
I think we're missing logging for cancelled tasks similar to slow query logging. Users will want to turn on something like "cancellable tasks logging" to enable dumping the query body in the logs upon cancellation to debug whether the cancellation is typical for one problematic query or index.
in search_shard_task
-- I'm not seeing a shard ID being returned. I imagine this is because you are querying a specific shard already, but including this information in the response might be helpful
We are proposing the following update to the stats API structure in order to address the following points:
- Include shard ID information in the cancellation breakup which would be useful. (Addresses comment from @kgcreative)
-
search_shard_task
is made the top-level node within which all meta data related to search shard task is added. With this, we are no longer tied to use the same structure for future milestones and thus is more flexible. -
current_stats
is renamed toresource_tracker_stats
since this section now contains thecancellation_count
as well and hence it no longer contains only "current" stats.
Please let me know if you have any comments on the below stats API structure:
"search_backpressure": {
"search_shard_task": {
"resource_tracker_stats": {
"heap_memory_consumed_bytes": {
"current_avg": 0,
"current_max": 0,
"rolling_avg": 0,
"cancellation_count": 0
},
"cpu_time_consumed_nanos": {
"current_max": 0,
"current_avg": 0,
"cancellation_count": 0
},
"elapsed_time_nanos": {
"current_max": 0,
"current_avg": 0,
"cancellation_count": 0
}
},
"cancellation_stats": {
"cancellation_count": 0,
"shard_cancellation_count": {
"<shard_id_1>": 0,
"<shard_id_2>": 0
},
"cancellation_limit_reached_count": 0,
"last_cancelled_task": {
"memory_consumed_bytes": 0,
"cpu_consumed_nanos": 0,
"elapsed_time_nanos": 0
}
}
}
"enabled": true,
"enforced": true
}
@tushar-kharbanda72 I've added this to the roadmap per the 2.4 label and wanted to confirm this is on track for 2.4. Thanks!
@elfisher Yes, we are on track for 2.4 release for Milestone 1 (Tracking and cancellation of search shard tasks alone. Does not include co-ordinator tasks).
In general agree with the approach above and the direction it is taking. A point to consider is asynchronous or long running query cancelling them without consideration of priority or ability to constraint/sandbox a query might be an over simplification of the issue. I might want to run a query that is going to be long running and scanning a large data set. It might be slow and take time but the results are important. Cancelling should be employed while identifying a rouge query / task. I understand that is the primary intention here. But if we want to automate this we need to consider priority and kind of query as well so QoS and Query Cost might be high but required. I think the uber goal would be to be able to prioritize and control resource consumption at execution. Might not directly fit in to this issue but should be something to consider as we look at Query execution overall.
@tushar-kharbanda72 do you still track this for 2.4 release? code freeze on 11/3 Is there anything pending? otherwise, feel free to close it.
@nssuresh2007 are you on track for 2.4 release? Today is the code freeze.
@anasalkouz Yes, as per the plan, code changes for Milestone 1 of this issue are merged to 2.4.
Performance comparison after the Search Backpressure changes
Summary: We did not see any degradation in performance due to Search BP.
Domain Setup
Data nodes: r5d.large, 2 nodes Master node: r5d.large nyc_taxis workload 5 primary shards, 1 replica shard
Benchmark command used:
opensearch-benchmark execute_test --workload nyc_taxis --include-tasks="default, range, distance_amount_agg, autohisto_agg, date_histogram_agg" --pipeline=benchmark-only --target-hosts=172.31.24.13:9200,172.31.19.56:9200 --workload-params "search_clients:10"
Baseline: Without Search BP changes Contender: With Search BP enabled and enforced
Detailed Results
[ec2-user@ip-172-31-31-148 ~]$ opensearch-benchmark compare --baseline 127a9078-dad7-4f87-8c51-5c5e89fae478 --contender ef93e21a-8263-44a3-a886[113/1984]
4f7
____ _____ __ ____ __ __
/ __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__
/ / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_|
/_/
Comparing baseline
TestExecution ID: 127a9078-dad7-4f87-8c51-5c5e89fae478
TestExecution timestamp: 2022-11-17 09:16:43
TestProcedure: append-no-conflicts
ProvisionConfigInstance: external
with contender
TestExecution ID: ef93e21a-8263-44a3-a886-84d912bb34f7
TestExecution timestamp: 2022-11-17 10:11:13
TestProcedure: append-no-conflicts
ProvisionConfigInstance: external
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Baseline | Contender | Diff | Unit |
|--------------------------------------------------------------:|--------------------:|------------:|------------:|---------:|-------:|
| Cumulative indexing time of primary shards | | 0 | 0 | 0 | min |
| Min cumulative indexing time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative indexing time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative indexing time across primary shard | | 0 | 0 | 0 | min |
| Cumulative indexing throttle time of primary shards | | 0 | 0 | 0 | min |
| Min cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min |
| Cumulative merge time of primary shards | | 0 | 0 | 0 | min |
| Cumulative merge count of primary shards | | 0 | 0 | 0 | |
| Min cumulative merge time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative merge time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative merge time across primary shard | | 0 | 0 | 0 | min |
| Cumulative merge throttle time of primary shards | | 0 | 0 | 0 | min |
| Min cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min |
| Cumulative refresh time of primary shards | | 0 | 0 | 0 | min |
| Cumulative refresh count of primary shards | | 16 | 16 | 0 | |
| Min cumulative refresh time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative refresh time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative refresh time across primary shard | | 0 | 0 | 0 | min |
| Cumulative flush time of primary shards | | 0 | 0 | 0 | min |
| Cumulative flush count of primary shards | | 4 | 4 | 0 | |
| Min cumulative flush time across primary shard | | 0 | 0 | 0 | min |
| Median cumulative flush time across primary shard | | 0 | 0 | 0 | min |
| Max cumulative flush time across primary shard | | 0 | 0 | 0 | min |
| Total Young Gen GC time | | 3.157 | 1.695 | -1.462 | s |
| Total Young Gen GC count | | 714 | 177 | -537 | |
| Total Old Gen GC time | | 0 | 0 | 0 | s |
| Total Old Gen GC count | | 0 | 0 | 0 | |
| Store size | | 45.1477 | 45.1477 | 0 | GB |
| Translog size | | 4.09782e-07 | 4.09782e-07 | 0 | GB |
| Heap used for segments | | 0 | 0 | 0 | MB |
| Heap used for doc values | | 0 | 0 | 0 | MB |
| Heap used for terms | | 0 | 0 | 0 | MB |
| Heap used for norms | | 0 | 0 | 0 | MB |
| Heap used for points | | 0 | 0 | 0 | MB |
| Heap used for stored fields | | 0 | 0 | 0 | MB | [42/1984]
| Segment count | | 100 | 100 | 0 | |
| Min Throughput | default | 2.69059 | 2.67181 | -0.01878 | ops/s |
| Mean Throughput | default | 2.86302 | 2.83447 | -0.02856 | ops/s |
| Median Throughput | default | 2.88521 | 2.84319 | -0.04202 | ops/s |
| Max Throughput | default | 2.93564 | 2.91102 | -0.02462 | ops/s |
| 50th percentile latency | default | 5876.45 | 5252 | -624.449 | ms |
| 90th percentile latency | default | 10509.5 | 8915.29 | -1594.18 | ms |
| 99th percentile latency | default | 16759.6 | 11515.6 | -5243.97 | ms |
| 99.9th percentile latency | default | 19499.4 | 12009.4 | -7489.99 | ms |
| 100th percentile latency | default | 19502.8 | 13962.1 | -5540.67 | ms |
| 50th percentile service time | default | 1088.66 | 967.335 | -121.329 | ms |
| 90th percentile service time | default | 7847.06 | 6894.62 | -952.439 | ms |
| 99th percentile service time | default | 11214 | 11282.5 | 68.4265 | ms |
| 99.9th percentile service time | default | 12032.7 | 11944.4 | -88.277 | ms |
| 100th percentile service time | default | 12305.8 | 11966.6 | -339.18 | ms |
| error rate | default | 0 | 0 | 0 | % |
| Min Throughput | range | 0.694275 | 0.688415 | -0.00586 | ops/s |
| Mean Throughput | range | 0.696822 | 0.69355 | -0.00327 | ops/s |
| Median Throughput | range | 0.697106 | 0.694122 | -0.00298 | ops/s |
| Max Throughput | range | 0.698065 | 0.696061 | -0.002 | ops/s |
| 50th percentile latency | range | 4482.7 | 3537.39 | -945.315 | ms |
| 90th percentile latency | range | 7434 | 6644.22 | -789.78 | ms |
| 99th percentile latency | range | 11593.8 | 10287.5 | -1306.29 | ms |
| 99.9th percentile latency | range | 12456.5 | 11315.5 | -1140.94 | ms |
| 100th percentile latency | range | 12460.8 | 11315.7 | -1145.16 | ms |
| 50th percentile service time | range | 4478.09 | 3534.5 | -943.581 | ms |
| 90th percentile service time | range | 7429.25 | 6637.51 | -791.743 | ms |
| 99th percentile service time | range | 11590.1 | 10284.6 | -1305.59 | ms |
| 99.9th percentile service time | range | 12456.2 | 11308 | -1148.19 | ms |
| 100th percentile service time | range | 12459 | 11314.3 | -1144.64 | ms |
| error rate | range | 0 | 0 | 0 | % |
| Min Throughput | distance_amount_agg | 1.94796 | 1.9031 | -0.04486 | ops/s |
| Mean Throughput | distance_amount_agg | 1.98886 | 1.95379 | -0.03507 | ops/s |
| Median Throughput | distance_amount_agg | 1.99014 | 1.95777 | -0.03237 | ops/s |
| Max Throughput | distance_amount_agg | 1.99999 | 1.97455 | -0.02545 | ops/s |
| 50th percentile latency | distance_amount_agg | 4270.56 | 3670.89 | -599.661 | ms |
| 90th percentile latency | distance_amount_agg | 7183.21 | 6759.99 | -423.225 | ms |
| 99th percentile latency | distance_amount_agg | 9472.4 | 8844.1 | -628.297 | ms |
| 99.9th percentile latency | distance_amount_agg | 10226.2 | 9790.65 | -435.596 | ms |
| 100th percentile latency | distance_amount_agg | 10285.7 | 9790.78 | -494.954 | ms |
| 50th percentile service time | distance_amount_agg | 3631.92 | 3152.59 | -479.324 | ms |
| 90th percentile service time | distance_amount_agg | 7078.86 | 6639.41 | -439.452 | ms | [0/1984]
| 99th percentile service time | distance_amount_agg | 9393.15 | 8843.08 | -550.071 | ms |
| 99.9th percentile service time | distance_amount_agg | 9896.1 | 9789.52 | -106.583 | ms |
| 100th percentile service time | distance_amount_agg | 9896.23 | 9790.48 | -105.753 | ms |
| error rate | distance_amount_agg | 0 | 0 | 0 | % |
| Min Throughput | autohisto_agg | 1.46191 | 1.45333 | -0.00858 | ops/s |
| Mean Throughput | autohisto_agg | 1.48084 | 1.47418 | -0.00666 | ops/s |
| Median Throughput | autohisto_agg | 1.48235 | 1.47627 | -0.00607 | ops/s |
| Max Throughput | autohisto_agg | 1.49368 | 1.48551 | -0.00817 | ops/s |
| 50th percentile latency | autohisto_agg | 5598.59 | 5020.03 | -578.563 | ms |
| 90th percentile latency | autohisto_agg | 7406.7 | 6705.89 | -700.807 | ms |
| 99th percentile latency | autohisto_agg | 8952.92 | 7160.13 | -1792.78 | ms |
| 99.9th percentile latency | autohisto_agg | 11580.6 | 11251 | -329.612 | ms |
| 100th percentile latency | autohisto_agg | 12109.1 | 11375.5 | -733.623 | ms |
| 50th percentile service time | autohisto_agg | 5148.51 | 4952.59 | -195.914 | ms |
| 90th percentile service time | autohisto_agg | 7391.63 | 6704.59 | -687.043 | ms |
| 99th percentile service time | autohisto_agg | 8734.02 | 7158.87 | -1575.15 | ms |
| 99.9th percentile service time | autohisto_agg | 11579.5 | 11249.2 | -330.245 | ms |
| 100th percentile service time | autohisto_agg | 12106.3 | 11374.1 | -732.222 | ms |
| error rate | autohisto_agg | 0 | 0 | 0 | % |
| Min Throughput | date_histogram_agg | 1.47286 | 1.45333 | -0.01953 | ops/s |
| Mean Throughput | date_histogram_agg | 1.48684 | 1.47402 | -0.01282 | ops/s |
| Median Throughput | date_histogram_agg | 1.48777 | 1.47616 | -0.01161 | ops/s |
| Max Throughput | date_histogram_agg | 1.49736 | 1.48482 | -0.01254 | ops/s |
| 50th percentile latency | date_histogram_agg | 5802.18 | 4954.77 | -847.412 | ms |
| 90th percentile latency | date_histogram_agg | 7516.1 | 6699.56 | -816.54 | ms |
| 99th percentile latency | date_histogram_agg | 8955.02 | 7149.38 | -1805.64 | ms |
| 99.9th percentile latency | date_histogram_agg | 11201.3 | 11243.9 | 42.5528 | ms |
| 100th percentile latency | date_histogram_agg | 11576.5 | 11375.5 | -201.003 | ms |
| 50th percentile service time | date_histogram_agg | 5369.37 | 4869.96 | -499.414 | ms |
| 90th percentile service time | date_histogram_agg | 7515.15 | 6699.08 | -816.063 | ms |
| 99th percentile service time | date_histogram_agg | 8908.5 | 7148.5 | -1760 | ms |
| 99.9th percentile service time | date_histogram_agg | 11101.4 | 11241.7 | 140.382 | ms |
| 100th percentile service time | date_histogram_agg | 11576.1 | 11374.4 | -201.663 | ms |
| error rate | date_histogram_agg | 0 | 0 | 0 | % |
How Search BP behaved with Rogue Query:
We added the following rogue query to the nyc_taxis workload (date_histogram_agg_rogue
below):
{"query": {"range":{"dropoff_datetime":{"from":"01/01/2015","to":"15/01/2015",
"include_lower":true,"include_upper":true,"format":"dd/MM/yyyy","boost":1.0}}},
"aggregations":{"dropoffs_over_time":{"date_histogram":{"field":"dropoff_datetime",
"calendar_interval":"second","offset":0,"order":{"_key":"asc"},"keyed":false,
"min_doc_count":0}}}}
Summary
- Without Search BP, the rogue query was occupying lot of heap and eventually brought down one of the nodes.
- When Search BP was enabled, it was able to cancel the tasks from the rogue query and kept the instance from going down.
- Since one of the node went down during the course of test (when Search BP disabled), it did not make sense to compare the failure rate and latencies between the two tests.
Test setup
Used locust with following configuration:
locustfile = ~/locust/locustfile.py
headless = true
users = 50
spawn-rate = 10
run-time = 1h
This test used same queries used by opensearch-benchmark tool.
With Search BP enabled
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|---------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST auto_histo_agg 908 98(10.79%) | 27443 1 124668 24000 | 0.25 0.03
POST date_histogram_agg 863 75(8.69%) | 39613 1 144642 39000 | 0.24 0.02
POST date_histogram_agg_rogue 180 37(20.56%) | 126966 1 334989 126000 | 0.05 0.01
POST default 928 45(4.85%) | 34300 1 145420 31000 | 0.26 0.01
POST distance_histo 874 107(12.24%) | 26328 1 122862 23000 | 0.24 0.03
POST range 942 37(3.93%) | 36249 1 133501 35000 | 0.26 0.01
--------|---------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
Aggregated 4695 399(8.50%) | 36410 1 334989 30000 | 1.31 0.11
Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 100% # reqs
--------|-------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
POST auto_histo_agg 24000 32000 38000 41000 54000 70000 87000 103000 125000 125000 125000 908
POST date_histogram_agg 39000 48000 55000 59000 76000 89000 106000 118000 145000 145000 145000 863
POST date_histogram_agg_rogue 129000 151000 162000 177000 211000 244000 268000 275000 335000 335000 335000 180
POST default 31000 42000 48000 53000 71000 87000 102000 110000 145000 145000 145000 928
POST distance_histo 23000 31000 37000 40000 55000 70000 88000 96000 123000 123000 123000 874
POST range 35000 45000 50000 56000 73000 86000 105000 113000 134000 134000 134000 942
--------|-------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
Aggregated 30000 41000 48000 54000 74000 96000 131000 159000 257000 335000 335000 4695
Error report
#occurrences Error
------------------|--------------------------------------------------------------------------------------------------------------------------------------
45 POST default: HTTPError('429 Client Error: Too Many Requests for url: default')
33 POST range: HTTPError('429 Client Error: Too Many Requests for url: range')
73 POST date_histogram_agg: HTTPError('429 Client Error: Too Many Requests for url: date_histogram_agg')
98 POST auto_histo_agg: HTTPError('429 Client Error: Too Many Requests for url: auto_histo_agg')
103 POST distance_histo: HTTPError('429 Client Error: Too Many Requests for url: distance_histo')
26 POST date_histogram_agg_rogue: HTTPError('429 Client Error: Too Many Requests for url: date_histogram_agg_rogue')
4 POST range: HTTPError('500 Server Error: Internal Server Error for url: range')
4 POST distance_histo: HTTPError('500 Server Error: Internal Server Error for url: distance_histo')
11 POST date_histogram_agg_rogue: HTTPError('500 Server Error: Internal Server Error for url: date_histogram_agg_rogue')
2 POST date_histogram_agg: HTTPError('500 Server Error: Internal Server Error for url: date_histogram_agg')
------------------|--------------------------------------------------------------------------------------------------------------------------------------
With Search BP disabled
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|---------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST auto_histo_agg 1550 1113(71.81%) | 17322 0 242124 2 | 0.43 0.31
POST date_histogram_agg 1550 1129(72.84%) | 22824 0 269577 2 | 0.43 0.31
POST date_histogram_agg_rogue 273 213(78.02%) | 44443 0 279034 2 | 0.08 0.06
POST default 1522 908(59.66%) | 22112 0 240342 2 | 0.42 0.25
POST distance_histo 1588 1106(69.65%) | 15864 0 220096 2 | 0.44 0.31
POST range 1560 930(59.62%) | 22026 0 241322 2 | 0.43 0.26
--------|---------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
Aggregated 8043 5399(67.13%) | 20834 0 279034 2 | 2.23 1.5
Response time percentiles (approximated)
Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs
--------|-------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
POST auto_histo_agg 2 25000 35000 41000 49000 57000 73000 85000 240000 242000 242000 1550
POST date_histogram_agg 2 34000 47000 52000 66000 80000 94000 123000 243000 270000 270000 1550
POST date_histogram_agg_rogue 2 44000 68000 106000 141000 191000 269000 270000 279000 279000 279000 273
POST default 2 36000 48000 52000 65000 79000 89000 110000 238000 240000 240000 1522
POST distance_histo 2 21000 33000 37000 47000 55000 69000 78000 220000 220000 220000 1588
POST range 2 37000 48000 52000 66000 80000 90000 101000 190000 241000 241000 1560
--------|-------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
Aggregated 2 29000 42000 47000 60000 75000 94000 129000 243000 279000 279000 8043
Error report
#occurrences Error
------------------|--------------------------------------------------------------------------------------------------------------------------------------
168 POST range: HTTPError('429 Client Error: Too Many Requests for url: range')
355 POST auto_histo_agg: HTTPError('429 Client Error: Too Many Requests for url: auto_histo_agg')
367 POST date_histogram_agg: HTTPError('429 Client Error: Too Many Requests for url: date_histogram_agg')
333 POST distance_histo: HTTPError('429 Client Error: Too Many Requests for url: distance_histo')
155 POST default: HTTPError('429 Client Error: Too Many Requests for url: default')
77 POST date_histogram_agg_rogue: HTTPError('429 Client Error: Too Many Requests for url: date_histogram_agg_rogue')
7 POST distance_histo: RemoteDisconnected('Remote end closed connection without response')
7 POST auto_histo_agg: RemoteDisconnected('Remote end closed connection without response')
2 POST default: ConnectionResetError(104, 'Connection reset by peer')
1 POST range: ConnectionResetError(104, 'Connection reset by peer')
9 POST date_histogram_agg: RemoteDisconnected('Remote end closed connection without response')
3 POST range: RemoteDisconnected('Remote end closed connection without response')
5 POST default: RemoteDisconnected('Remote end closed connection without response')
6 POST date_histogram_agg_rogue: RemoteDisconnected('Remote end closed connection without response')
6 POST auto_histo_agg: ConnectionResetError(104, 'Connection reset by peer')
3 POST date_histogram_agg: ConnectionResetError(104, 'Connection reset by peer')
1 POST distance_histo: ConnectionResetError(104, 'Connection reset by peer')
745 POST auto_histo_agg: ConnectionRefusedError(111, 'Connection refused')
758 POST range: ConnectionRefusedError(111, 'Connection refused')
750 POST date_histogram_agg: ConnectionRefusedError(111, 'Connection refused')
765 POST distance_histo: ConnectionRefusedError(111, 'Connection refused')
746 POST default: ConnectionRefusedError(111, 'Connection refused')
130 POST date_histogram_agg_rogue: ConnectionRefusedError(111, 'Connection refused')
------------------|--------------------------------------------------------------------------------------------------------------------------------------
Note: OpenSearch crashed on one of the instances due to heap space limit reached.
@rramachand21 Is this still on track for v2.6.0? Code freeze is today (Feb 21, 2023)
Also, i'm assuming this is for Milestone 2 (#1329) since https://github.com/opensearch-project/OpenSearch/issues/1181#issuecomment-1302456200 notes that Milestone 1 was released with v2.4.0
@kartg yes, this is on track for v2.6.0, and yes, it's for milestone 2.
Bumping tag to 2.7.0 since #6455 was merged in for 2.6
Hi @tushar-kharbanda72, This issue will be marked for next-release v2.8.0
on (Apr 17) as that is the code-freeze date for v2.7.0
. Please let me know if otherwise.
Tagging it for v2.8.0