cloud-on-k8s Upgrading from ECK 2.4.0 to latest version may fail

Got this error for one of my Elasticsearch cluster while upgrading from 2.4.0 to main:

400 Bad Request: {Status:400 Error:{CausedBy:{Reason: Type:}
Reason:
Desired nodes with history [d7bf9e8e-47a0-40ad-8156-400ae519eb6c] and version [2] already exists with a different definition

I think (not 💯 sure yet) this is because of https://github.com/elastic/cloud-on-k8s/pull/5950: the Elasticsearch configuration has changed, but the metadata.generation of the Elasticsearch resource, used as the version field for the desired nodes API, is still the same.

I wonder if we should have an "upgrade" e2e pipeline, something like our upgrade-test-harness, that should be run automatically when submitting a PR in order to detect this kind of issue.

Aug 29 '22 15:08 barkbay

Similar problem https://github.com/elastic/cloud-on-k8s/issues/6027 there it is a PVC resize that leads to multiple updates with an unchanging spec.

Sep 21 '22 14:09 pebrc

We discussed potential solutions and @barkbay suggested two approaches:

disabling desired_nodes for the next release
switching to a conditional PUT after GET approach:

GET the _lastest desired nodes topology from Elasticsearch and compare with the expected desired nodes
If the topologies are the same stop
If they differ take the version returned from the GET call and increment itPUT the new topology

Sep 22 '22 15:09 pebrc

Comparing the _latest desired nodes returned via the Elasticsearch API with the expected values turns out to be trickier than I thought:

{"service.version": "2.5.0-SNAPSHOT+dff6b534", "iteration": "1", "namespace": "default", "es_name": "autoscaling-sample", "diff": ["slice[0].Settings.map[node].map[store].map[allow_mmap]: string != bool", "slice[0].Settings.map[xpack].map[security].map[http].map[ssl].map[enabled]: string != bool", "slice[0].Settings.map[xpack].map[security].map[authc].map[realms].map[native].map[native1].map[order]: string != int64", "slice[0].Settings.map[xpack].map[security].map[authc].map[realms].map[file].map[file1].map[order]: string != int64", "slice[0].Memory: 3gb != 3221225472b", "slice[0].Storage: 4gb != 4294967296b", "slice[1].Settings.map[xpack].map[security].map[http].map[ssl].map[enabled]: string != bool", "slice[1].Settings.map[xpack].map[security].map[authc].map[realms].map[native].map[native1].map[order]: string != int64", "slice[1].Settings.map[xpack].map[security].map[authc].map[realms].map[file].map[file1].map[order]: string != int64", "slice[1].Settings.map[node].map[store].map[allow_mmap]: string != bool"]}

Elasticsearch transforms the submitted data: it stringifies all the booleans and integers and it also transforms the resource units to the largest applicable i.e instead or bytes of memory it returns gigabytes.

My fear is that implementing a comparison after mirroring the same transformations might be brittle and am thinking we should just stick to the current approach of updating at each reconciliation with an incremented version number.

Sep 23 '22 07:09 pebrc

I am thinking about ways to optimise this. But it is quite involved. One idea follow below.

First iteration:

PUT desired nodes topology and calculate the hash of the submitted request payload
Store it in an annotation together with the version for example the orchestration hints annotation

Subbsequent iterations:

GET the _latest desired nodes topology from Elasticsearch and stored version and hash from the annotation
Calculate the hash of the expected desired nodes and compare hash and version. If the hash/version are the same stop
If they differ take the version returned from the GET call and increment it PUT the new topology
update the orchestration hint annotation with the new version and hash

This would address the following concerns:

reduce the number of updates to the Elasticsearch API with identical topologies
handles the case where a third party changes or deletes the desired nodes by GET before PUT
in steady state no updates are posted to Elasticsearch (e.g. if reconciliation is triggered by cache refresh, operator restart or spec changes that have not relevance for the desired nodes API)

This comes with the downside of additional complexity and annotation updates to the ES resource

Sep 23 '22 09:09 pebrc