elasticsearch Expose effective watermark thresholds via APIs

The effective low/high/flood watermarks that apply to the cluster depends on both the thresholds and the max_headrooms settings. To make it easier to see what the effective value is, we could calculate/expose the threshold under _nodes/stats and/or _cat/allocation APIs.

Mar 22 '24 13:03 pxsalehi

Pinging @elastic/es-distributed (Team:Distributed)

Mar 22 '24 13:03 elasticsearchmachine

Hi @pxsalehi , shall we need to add an extra disk watermark thresholds into _cat/allocation APIs? Since each node thresholds would be the same, is that necessary to add these thresholds in the follow default colums?

_cat/allocations : shards disk.indices disk.used disk.avail disk.total disk.percent host ip node 16 5.7tb 5.9tb 958.9gb 6.9tb 86 127.0.0.1 127.0.0.1 node-1

Mar 23 '24 04:03 howardhuanghua

I think the idea is for example to have for each watermark a new column (and also available in _nodes/stats) that would provide the effective watermark in %. E.g. if the default watermark is 90% but the max_headroom is the one that decides the watermark, calculate the effective value based on that, e.g. for a large disk it might be 99%.

Mar 25 '24 16:03 pxsalehi

I expect this'll take several PRs to completely address. I'd suggest exposing the raw watermark numbers in GET _nodes/stats first and then we can think about adding columns to GET _cat/allocation in a follow-up PR.

It will certainly be useful to display the actual watermarks as %age values, but in many cases they're all going to come out as 99% which isn't really very helpful. We can add some decimal places, but IME folks really want to know the size of the gap (as a bytes value) between the actual disk usage and each of the three watermarks. This is going to be a little tricky since today ByteSizeValue doesn't support negative sizes, and yet here we need some way to represent being both under and over each watermark. Not impossible at all, just a little more complex than it might first appear bearing in mind that we must integrate properly with the ?s= and ?bytes= query parameters. I might suggest adding the percentages in one PR and then thinking about these more useful columns in another.

Mar 29 '24 07:03 DaveCTurner

Appreciate the help in https://github.com/elastic/elasticsearch/pull/107244, @DaveCTurner . So next step we are going to add extra threshold columns in _cat/allocation ?

Apr 18 '24 14:04 howardhuanghua

Yep that's right

Apr 18 '24 14:04 DaveCTurner

Hi @DaveCTurner, if a node has multiple disk paths, then cat/allocation only shows the max low/high/flood watermark byte size value right?

Apr 27 '24 13:04 howardhuanghua

I don't think there's a good way to represent these values on a node-by-node basis if multiple data paths are in play. I think it would be best to just display something like <multiple> in that case.

However, I would like us to have a column indicating the watermark status of each node, NONE/LOW/HIGH/FLOOD, , and also columns showing how far below each watermark each node is (taking the minimum value if there are multiple data paths). That should make sense. But it's a little tricky to do this because a node which is above a watermark would need to represent this as a negative number, and ByteSizeValue doesn't support negative values, so some extra work is needed to get this right.

Apr 27 '24 13:04 DaveCTurner

I think it's better to show all the three thresholds, this would get global perspective. I create an example table for cat/allocation result:

shards	disk.indices	disk.used	disk.avail	disk.total	disk.percent	low_wm	high_wm	flood_wm	host	ip	node
117	2.7tb	6.7tb	0.2tb	6.9tb	81	340.5gb	560.5gb	750.7gb	127.0.0.1	127.0.0.1	data-node1
117	2.7tb	6.8tb	0.1tb	6.9tb	87	-38.7gb (above low)	230.8gb	450.7gb	127.0.0.1	127.0.0.1	data-node2
117	2.7tb	3.2gb	3.5gb	7.0gb	51	-	-	-	127.0.0.2	127.0.0.2	master-node

Apr 28 '24 03:04 howardhuanghua

Yes, sorry, to clarify I think we should add seven new columns (names TBD but something like this):

disk.watermark.low.threshold, disk.watermark.high.threshold, disk.watermark.flood_stage.threshold: the raw values as calculated in #107244, as a regular ByteSizeValue column, except if the node has multiple data paths then show a placeholder like <multiple>
disk.watermark.low.avail, disk.watermark.high.avail, disk.watermark.flood_stage.avail: the available space between the current disk usage and the relevant watermark, as a signed ByteSizeValue column (negative meaning the watermark has been exceeded) taking the minimum across data paths if the node has multiple.
disk.watermark.exceeded: a column containing a string NONE, LOW, HIGH or FLOOD_STAGE indicating at a glance which of the watermarks have been exceeded on each node (where sorting on this column should order the NONE rows before the LOW rows, then the HIGH rows, and finally the FLOOD_STAGE rows).

Apr 28 '24 04:04 DaveCTurner

Support negative size in ByteSizeValue https://github.com/elastic/elasticsearch/pull/107988.

Apr 28 '24 14:04 howardhuanghua

elasticsearch elasticsearch copied to clipboard

Expose effective watermark thresholds via APIs

elasticsearch
elasticsearch copied to clipboard