etcd-operator icon indicating copy to clipboard operation
etcd-operator copied to clipboard

Guess etcd replicas number function

Open kvaps opened this issue 1 year ago • 6 comments

According to the latest meeting 2024-06-18 MINUTES we decided that we need a function that guesses the needed amount of etcd replicas.

It can be used for recovering non-exising STS object and also for scaling from 0 Design ref: https://github.com/aenix-io/etcd-operator/pull/181

Proposal:

  • Create variable guessed=0
  • Check cluster-state configmap
    • if configmap exists and initial-cluster-members defined
      • if there are any hostnames defined in initial-cluster-members
        • take the hostname of pod with highest number and +1
          • save value into guessed variable
  • Check endpoins for etcd-headless service
    • if there are any endpoints
      • connect to the cluster using endpoint and collect information from member list
        • if there are any members in output from etcd
          • take the hostname with highest number and +1
            • if value is greater then value in guessed, save value into guessed variable
      • read endpoints from kubernetes object:
        • take the hostname of the pod for endpoint with highest number and +1
          • if value is greater then value in guessed, save value into guessed variable
  • read persistent volume claims that falls under StatefulSet label selector
    • if there are any pvcs
      • take the name of the pvc with highest number and +1
        • if value is greater then value in guessed, save value into guessed variable
  • read pods pods that falls under StatefulSet label selector
    • if there are any pods
      • take the pod name with highest number and +1
        • if value is greater then value in guessed, save value into guessed variable
  • return guessed

kvaps avatar Jun 19 '24 13:06 kvaps

I would definitely like to drop these steps altogether.

Check cluster-state configmap

  • if configmap exists and initial-cluster-members defined

    • if there are any hostnames defined in initial-cluster-members

      • take the hostname of pod with highest number and +1

        • save value into guessed variable

This seems redundant, as we already have this info from checking the Endpoints object:

read pods pods that falls under StatefulSet label selector

  • if there are any pods

    • take the pod name with highest number and +1

      • if value is greater then value in guessed, save value into guessed variable

I don't like this step at all:

if value is greater then value in guessed, save value into guessed variable

IMO, if we found a value from a reliable source, such as member list, we should never fall back to a less reliable source, such as "number of endpoints". Only if the more reliable source is unavailable (e.g. we cannot get member list due to lack of quorum), should we try guessing the right number of replicas from Endpoints or PVCs.

lllamnyp avatar Jun 19 '24 13:06 lllamnyp

@lllamnyp

I would definitely like to drop these steps:

Check cluster-state configmap

it is created at initial and keeps existing all the time. It should always contain correct infromation, until someone will remove it, why no using it?

read pods pods that falls under StatefulSet label selector This seems redundant, as we already have this info from checking the Endpoints object

Are all our pods always get into service endpoints? If so it can be omitted. Also is there any chance that by running this check service and endpoints will not be exising?

If we consider member list as reliable source, then you're right, let's return it directly

v2:

  • Create variable guessed=0
  • Check endpoins for etcd-headless service
    • if there are any endpoints
      • connect to the cluster using endpoint and collect information from member list
        • if there are any members in output from etcd
          • take the hostname with highest number and +1
            • return value
      • read endpoints from kubernetes object:
        • take the hostname of the pod for endpoint with highest number and +1
          • if value is greater then value in guessed, save value into guessed variable
  • Check cluster-state configmap
    • if configmap exists and initial-cluster-members defined
      • if there are any hostnames defined in initial-cluster-members
        • take the hostname of pod with highest number and +1
          • save value into guessed variable
  • read persistent volume claims that falls under StatefulSet label selector
    • if there are any pvcs
      • take the name of the pvc with highest number and +1
        • if value is greater then value in guessed, save value into guessed variable
  • return guessed

kvaps avatar Jun 19 '24 13:06 kvaps

Etcd-headless service will always have endpoints - it doesn't rely on readiness probes => so all created pods with ip addresses will be in the headless-service. This service is ensured in the very beginning => so it must exist.

I personally do not like checking cluster-state configmap because in the past we agreed that this is some kind of cache and it would be nice to get this info from etcd pvcs. So amount of pvcs in my opinion is more reliable source than cluster-state cm. So cm can be checked but as a last resort.

Kirill-Garbar avatar Jun 19 '24 13:06 Kirill-Garbar

Okay it seems cluster-state configmap check makes no sense, so removed:

v3:

  • Create variable guessed=0
  • Check endpoins for etcd-headless service
    • if there are any endpoints
      • connect to the cluster using endpoint and collect information from member list
        • if there are any members in output from etcd
          • take the hostname with highest number and +1
            • return value
      • read endpoints from kubernetes object:
        • take the hostname of the pod for endpoint with highest number and +1
          • if value is greater then value in guessed, save value into guessed variable
  • read persistent volume claims that falls under StatefulSet label selector
    • if there are any pvcs
      • take the name of the pvc with highest number and +1
        • if value is greater then value in guessed, save value into guessed variable
  • return guessed

kvaps avatar Jun 19 '24 14:06 kvaps

Okay it seems cluster-state configmap check makes no sense, so removed:

v3:

  • return guessed

LGTM

lllamnyp avatar Jul 09 '24 14:07 lllamnyp

This function is tentatively implemented here as

func (o *observables) desiredReplicas() (max int) {}

lllamnyp avatar Jul 29 '24 21:07 lllamnyp