libcluster icon indicating copy to clipboard operation
libcluster copied to clipboard

Issue with StatefulSet Rolling Update Strategy

Open amacciola opened this issue 2 years ago • 10 comments

Precursor:

Currently all of our applications are deployed with StatefulSets vs being deployed with Deployments . The current UpdateStrategy of our StatefulSets is Rolling Updates. Here is an explanation of what it does and the other option we have: Screen Shot 2022-07-21 at 10 29 21 AM

Issue:

The combination of Rolling Updates && Libcluster is making it so that we can never add new services to the libcluster/horde registry. Because we will have

  1. 3 pods running all lets say with version 1
  2. Version 1 has libcluster/horde running but only Genserver_A is registered to it
  3. Then we trigger an update to this env with version 2
  4. In version 2 we have added a Genserver_B to be added to the libcluster/horde registry
  5. The Rolling Update will start with pod 2 out of 0,1,2. And it will not update the other pods with the new version until pod 2 is up and running
  6. However when pod 2 starts the libcluster detects pods 0 and 1 using the k8s labels and IPs and tries to register Genserver_B
  7. But pods 0 and 1 do not have the code for Genserver_B yet. So pod 2 crashes because it cannot start Genserver_B on the pod its trying to.
  8. And the Rolling update never proceeds to the other pods because pod 2 never passes

Or at least that is what i think is happening here. For the most part i think i have the issue correct and the error message on the pod that is crashing is

        ** (EXIT) an exception was raised:
            ** (UndefinedFunctionError) function Cogynt.Servers.Workers.CustomFields.start_link/1 is undefined or private
                (cogynt 0.1.0) Cogynt.Servers.Workers.CustomFields.start_link([name: {:via, Horde.Registry, {Cogynt.Horde.HordeRegistry, Cogynt.Servers.Workers.CustomFields}}])
                (horde 0.8.7) lib/horde/processes_supervisor.ex:766: Horde.ProcessesSupervisor.start_child/3
                (horde 0.8.7) lib/horde/processes_supervisor.ex:752: Horde.ProcessesSupervisor.handle_start_child/2
                (stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
                (stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
                (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

Even though i know the version on that pod has the code for Cogynt.Servers.Workers.CustomFields.start_link so it must be referring to one of the other 2 pods that had not got the new version yet.

Has anyone else every ran into this problem ?

amacciola avatar Jul 21 '22 16:07 amacciola

I think you would have the same issue even if you use deployment, new pods would crash-loop anyways.

The way how we solved this is: we have our own implementation of the Kubernetes strategy that polls k8s API and only joins nodes of the same version (from the version label) into a cluster. This makes it impossible to handoff state between application versions but makes sure that code that was never tested to co-live in a cluster would end up crashing in production.

AndrewDryga avatar Jul 31 '22 01:07 AndrewDryga

@AndrewDryga do you think this custom k8s strategy is worth a PR or can be shared ? Because i feel like how are more people not running into this same issue ? Is everyone else using this library only every deploying libcluster 1 time and then never adding new features into its registry from that point forward ?

amacciola avatar Jul 31 '22 04:07 amacciola

@amacciola the problem with our strategy is that it is very opinionated (uses specific labels named for our environment, uses node names from k8s labels, etc). I will think about open-sourcing it but it's a relatively easy change: just leverage labelSelector in get_nodes/4 callback implementation and query only the pods of a specific version.

AndrewDryga avatar Jul 31 '22 06:07 AndrewDryga

@AndrewDryga okay i will try this. So if i am trying to extend the k8s dns strategy here: https://github.com/bitwalker/libcluster/blob/main/lib/strategy/kubernetes_dns.ex

you are suggestion that i need to tweak the get_nodes method here:

https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes_dns.ex#L107

to also additionally query for a specific version or at least only matching version numbers ?

amacciola avatar Aug 03 '22 18:08 amacciola

@amacciola you can't extract that information from DNS server, instead you should modify that function in lib/strategy/kubernetes.ex. K8s API returns a lot of information about the pod including labels that you need to use to store the version.

AndrewDryga avatar Aug 03 '22 19:08 AndrewDryga

@AndrewDryga i see. So its changing

https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes.ex#L232-L252

so include additional params to only return info for certain version info

amacciola avatar Aug 03 '22 19:08 amacciola

@amacciola yes, you want to query for pods and return only the ones that match your current version

AndrewDryga avatar Aug 04 '22 17:08 AndrewDryga

@AndrewDryga i am working on testing this new strategy out now so thanks for the insight. But i just wanted to make sure i understood how some of the Libcluster combined with Horde registry code is working under the hood.

If we have 3 pods running for the same application. Each of these pods have lets say server_1 registered in the HordeRegistry.

  • pod_1 with version_1
  • pod_2 with version_1 -> running server_1
  • pod_3 with version_1

If we then trigger an update for version_2, which contains a new service that gets registered in the Horde Registry, server_2. Lets say the update starts with pod_3. When pod_3 comes online with the new version and finds pod_1 and pod_2 .

Does it just pick a process_id from one of the 3 pods to to try and start the new service on ? So you will have a 2 in 3 chance it tries to start the new service on a pod with version_1 and not version_2. ???

amacciola avatar Aug 10 '22 20:08 amacciola

I'm not using Horde but the pods with version 1 would not see pods with version 2 in the Erlang cluster, so basically, for each of the islands (one per version), everything would behave like it's a cluster with the same codebase. If you have globally unique jobs it also means that you will have two of the workers started (one per island).

AndrewDryga avatar Aug 12 '22 01:08 AndrewDryga

For now i have just created a separate HordeRegistry for each Genserver we would want to be leveraging the Libcluster strategies. As long as we dont have too many its a minor annoyance to fix this issue

amacciola avatar Aug 25 '22 14:08 amacciola