OpenSearch
OpenSearch copied to clipboard
[BUG] Setting `cluster.routing.allocation.exclude` only works if you specify a single value
Describe the bug
When using cluster.routing.allocation.exclude, for example cluster.routing.allocation.exclude._name to exclude an OpenSearch node from allocating shards, it will only function if a single item is set. If more than one item is set the setting has no effect and all shards are rebalanced as if the setting is not set at all.
I can verify the setting is set successful by viewing _cluster/settings but OpenSearch ignores this if it contains more than one value.
Related component
Other
To Reproduce
Testing with 5 OpenSearch nodes:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles cluster_manager name
192.168.0.176 15 59 6 0.21 0.16 0.12 dm cluster_manager,data - opsrch1
192.168.0.180 64 80 1 0.00 0.00 0.00 d data - opsrch4
192.168.0.178 43 66 1 0.00 0.01 0.00 d data - opsrch5
192.168.0.179 42 79 1 0.27 0.18 0.18 dm cluster_manager,data - opsrch2
192.168.0.177 45 70 3 0.13 0.15 0.11 dm cluster_manager,data * opsrch3
There are many existing indices, all of which are balanced evenly across all nodes:
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 340.2mb 11.6gb 86.2gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
23 357.8mb 10.6gb 87.2gb 97.9gb 10 192.168.0.179 192.168.0.179 opsrch2
22 234.1mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
22 349.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
21 131.1mb 10.5gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
When i exclude opsrch2 via
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch2",
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
23 390.1mb 11.6gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
1 99.8kb 10.2gb 87.6gb 97.9gb 10 192.168.0.179 192.168.0.179 opsrch2
22 281mb 10.6gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
45 695.7mb 10.8gb 87gb 97.9gb 11 192.168.0.180 192.168.0.180 opsrch4
22 132.5mb 10.5gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
I do see all shards deallocate from this node. If I change "opsrch2" to a list, e.g. ["opsrch2"], even if it has a single entry, the setting is completely ignored and shards are rebalanced across all nodes.
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": ["opsrch2"],
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
23 388.9mb 11.6gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
23 307.5mb 10.6gb 87.3gb 97.9gb 10 192.168.0.179 192.168.0.179 opsrch2
22 281.1mb 10.6gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
24 351.5mb 10.5gb 87.4gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
22 133.9mb 10.5gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
Expected behavior
cluster.routing.allocation.exclude. allows specifying more than a single node.
Additional Details
Plugins Vanillia, out of box, default
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: Ubuntu Server 22.04 LTS
- Version: OpenSearch 2.13.0
I do have zone allocation awareness set and zone allocation forced.
zoneA: (odd numbered nodes) opsrch1, opsrch3, opsrch5 zoneB: (even numbered nodes) opsrch2, opsrch4
I'm also realizing that the behavior appears very similar to https://github.com/opensearch-project/OpenSearch/issues/1716 in terms of overriding all existing exclusion attributes. Not sure if its a regression or not though.
[Triage - attendees 1 2 3 4]
@drewmiranda-gl Thanks for filing. I believe the correct usage is to provide a comma separated string, e.g. "node1, node2, node3", not a list. The documentation does say "comma separated" but I believe it can be much more clear because this is indeed confusing. Would you be interested in contributing and update to the documentation website?
I did try both (list and comma separated), and the setting was accepted successfully using the comma separated, however, if more than one value is set it seems to disable the setting (as if it was never set). Also setting the setting removes all existing values which only allows you to exclude a single node at a time.
I will retest to be extra sure
Here is what i observe:
excluding a single node does work:
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch2",
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
1 146.1kb 10.3gb 87.6gb 97.9gb 10 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
45 718.5mb 10.9gb 87gb 97.9gb 11 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
BUT setting cluster.routing.allocation.exclude._name to another value removes the existing value (this may be expected, i'm not sure)
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch4",
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
45 642.6mb 10.9gb 86.9gb 97.9gb 11 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
0 0b 10.2gb 87.7gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
1 UNASSIGNED
Attempting to set both does not work:
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch2,opsrch4",
"cluster.routing.allocation.enable": "all"
}
}'
(same as the above allocation output)
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
45 718.5mb 11gb 86.9gb 97.9gb 11 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
0 0b 10.2gb 87.7gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
So it does appear opensearch ignores the setting if more than one value is present and behaves as if it is not set even though it is indeed set.
Let me know if you have any questions.
Any updates/thoughts on this?
@shwetathareja Maybe you know how this works?
@drewmiranda-gl : comma separated values should work.
After applying
"cluster.routing.allocation.exclude._name": "opsrch2,opsrch4",
Can please run _cluster/allocation for a particular shardId (you can pick any shard) which is assigned to opsrch2 and share the output here. May be there is no other eligible node in the cluster for those shards to be assigned. Do you have zone awareness enabled?
Another thing you can try is bounce the opensearch process on "opsrch2" node and see if shards are still getting assigned to this node.
I went back to retest this and am now unable to reproduce. I did accidentally nuke 2 of my VMs (opsrch2, 4) and its possible something changed or is different now that i have rebuilt them.
I also realize that i may have had forced zone allocation awareness enabled causing issues as well:
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "zone",
"cluster.routing.allocation.awareness.force.zone.values":["zoneA", "zoneB"]
}
}
Given i can not reproduce this any longer and likely caused by some other configurations i had in the lab, i will close this comment.
Thanks. 🙏