elasticsearch_exporter icon indicating copy to clipboard operation
elasticsearch_exporter copied to clipboard

Add metrics to auto scale based on indexing pressure

Open tac-emil-andresen opened this issue 1 year ago • 1 comments

Add metrics indexing_pressure.memory.limit_in_bytes and indexing_pressure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indexing request memory buffer reaching capacity.

This change may address the following two issues:

https://github.com/prometheus-community/elasticsearch_exporter/issues/638 https://github.com/prometheus-community/elasticsearch_exporter/issues/875

There is an open pull request from over a year ago attempting to address issue 638:

https://github.com/prometheus-community/elasticsearch_exporter/pull/727

In pull request 727 a large number of indexing_pressure related metrics are included in the code changes. The comment chain indicates a concern that this could unnecessarily increase cardinality. In this pull request we only add the two metrics required to address the need to auto-scale based on indexing pressure.

We (Telus Agriculture and Consumer Goods) have a production ES cluster that we are upgrading from v7 to v8. As part of this upgrade the "elasticsearch_thread_pool_rejected_count" metric has been removed by the ES Dev Team because they switched from using a fixed length queue of indexing requests with a maximums size to using a memory buffer that defaults to 10% of available memory. In the past, when the queue would reach capacity, the cluster would start rejecting indexing requests and you could auto-scale up the cluster to address that pressure. Since the queue was eliminated we need a way to new way scale up based on indexing pressure so that we don't get behind on processing incoming requests. Based on our investigation, the new way to do this is to compare the total size of the indexing memory buffer to the current used amount of buffer. In this PR we add just the two metrics required to achieve auto scaling based on indexing pressure.

tac-emil-andresen avatar Jun 24 '24 17:06 tac-emil-andresen

Hi. I'm personally putting a $60 U.S. Dollar bounty on merging this PR (or an equivalent change) because my team needs it and seeing two other issues and a PR means I think there is some demand for this beyond just our team (and because open source maintainers are under appreciated). If you merge the PR please make sure you have your sponsor stuff setup in Github and I'll send you a one time $60 thank you.

tac-emil-andresen avatar Jun 27 '24 16:06 tac-emil-andresen

With the change to just include the cluster, host, and node labels rather than all the default labels, here is what the (sanitized) output from /metrics looks like:

elasticsearch_exporter % curl http://localhost:9114/metrics | grep pressure % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP elasticsearch_indexing_pressure_current_all_in_bytes Memory consumed, in bytes, by indexing requests in the coordinating, primary, or replica stage.

TYPE elasticsearch_indexing_pressure_current_all_in_bytes gauge

elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 768 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 280 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 0 elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 0

HELP elasticsearch_indexing_pressure_limit_in_bytes Configured memory limit, in bytes, for the indexing requests

TYPE elasticsearch_indexing_pressure_limit_in_bytes gauge

elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 8.22922444e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 8.22922444e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 8.22922444e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 8.24180736e+08 elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 8.24180736e+08 100 1296k 0 1296k 0 0 2897k 0 --:--:-- --:--:-- --:--:-- 2901k

tac-emil-andresen avatar Jul 06 '24 20:07 tac-emil-andresen

Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?

I've added the sign off to the last commit. I have not had to amend a commit on a fork before. So I used this command:

git commit --amend --signoff

Then "git push origin issue/638" gave me a message that "Updates were rejected because the tip of your current branch is behind". So I had to push the update to the branch using this command:

git push --force-with-lease origin issue/638

I hope that is correct in this scenario. The lines to be merged still look correct.

Many thanks for your help getting this merged!

tac-emil-andresen avatar Jul 11 '24 16:07 tac-emil-andresen

LGTM. Thanks!

Woo hoo! I promised a $60 bounty to get this merged. Where do you want it to go? If you have sponsorship setup in Github I can send it direct to you that way or via another channel like buymeacoffee.com. Or, I can donate it to the charity or open source foundation of your choice.

tac-emil-andresen avatar Jul 11 '24 18:07 tac-emil-andresen

@tac-emil-andresen I don't have sponsorship set up and I'm not sure it's worth it for me. If you want to donate, this is currently my charity of choice: https://www.thefarmette.org/donate Thanks

sysadmind avatar Sep 28 '24 16:09 sysadmind