libcluster Issue with StatefulSet Rolling Update Strategy

Precursor:

Currently all of our applications are deployed with StatefulSets vs being deployed with Deployments . The current UpdateStrategy of our StatefulSets is Rolling Updates. Here is an explanation of what it does and the other option we have: Screen Shot 2022-07-21 at 10 29 21 AM

Issue:

The combination of Rolling Updates && Libcluster is making it so that we can never add new services to the libcluster/horde registry. Because we will have

3 pods running all lets say with version 1
Version 1 has libcluster/horde running but only Genserver_A is registered to it
Then we trigger an update to this env with version 2
In version 2 we have added a Genserver_B to be added to the libcluster/horde registry
The Rolling Update will start with pod 2 out of 0,1,2. And it will not update the other pods with the new version until pod 2 is up and running
However when pod 2 starts the libcluster detects pods 0 and 1 using the k8s labels and IPs and tries to register Genserver_B
But pods 0 and 1 do not have the code for Genserver_B yet. So pod 2 crashes because it cannot start Genserver_B on the pod its trying to.
And the Rolling update never proceeds to the other pods because pod 2 never passes

Or at least that is what i think is happening here. For the most part i think i have the issue correct and the error message on the pod that is crashing is

        ** (EXIT) an exception was raised:
            ** (UndefinedFunctionError) function Cogynt.Servers.Workers.CustomFields.start_link/1 is undefined or private
                (cogynt 0.1.0) Cogynt.Servers.Workers.CustomFields.start_link([name: {:via, Horde.Registry, {Cogynt.Horde.HordeRegistry, Cogynt.Servers.Workers.CustomFields}}])
                (horde 0.8.7) lib/horde/processes_supervisor.ex:766: Horde.ProcessesSupervisor.start_child/3
                (horde 0.8.7) lib/horde/processes_supervisor.ex:752: Horde.ProcessesSupervisor.handle_start_child/2
                (stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
                (stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
                (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

Even though i know the version on that pod has the code for Cogynt.Servers.Workers.CustomFields.start_link so it must be referring to one of the other 2 pods that had not got the new version yet.

Has anyone else every ran into this problem ?

Jul 21 '22 16:07 amacciola

I think you would have the same issue even if you use deployment, new pods would crash-loop anyways.

The way how we solved this is: we have our own implementation of the Kubernetes strategy that polls k8s API and only joins nodes of the same version (from the version label) into a cluster. This makes it impossible to handoff state between application versions but makes sure that code that was never tested to co-live in a cluster would end up crashing in production.

Jul 31 '22 01:07 AndrewDryga

@AndrewDryga do you think this custom k8s strategy is worth a PR or can be shared ? Because i feel like how are more people not running into this same issue ? Is everyone else using this library only every deploying libcluster 1 time and then never adding new features into its registry from that point forward ?

Jul 31 '22 04:07 amacciola

@amacciola the problem with our strategy is that it is very opinionated (uses specific labels named for our environment, uses node names from k8s labels, etc). I will think about open-sourcing it but it's a relatively easy change: just leverage labelSelector in get_nodes/4 callback implementation and query only the pods of a specific version.

Jul 31 '22 06:07 AndrewDryga

@AndrewDryga okay i will try this. So if i am trying to extend the k8s dns strategy here: https://github.com/bitwalker/libcluster/blob/main/lib/strategy/kubernetes_dns.ex

you are suggestion that i need to tweak the get_nodes method here:

https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes_dns.ex#L107

to also additionally query for a specific version or at least only matching version numbers ?

Aug 03 '22 18:08 amacciola

@amacciola you can't extract that information from DNS server, instead you should modify that function in lib/strategy/kubernetes.ex. K8s API returns a lot of information about the pod including labels that you need to use to store the version.

Aug 03 '22 19:08 AndrewDryga

@AndrewDryga i see. So its changing

https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes.ex#L232-L252

so include additional params to only return info for certain version info

Aug 03 '22 19:08 amacciola

@amacciola yes, you want to query for pods and return only the ones that match your current version

Aug 04 '22 17:08 AndrewDryga

@AndrewDryga i am working on testing this new strategy out now so thanks for the insight. But i just wanted to make sure i understood how some of the Libcluster combined with Horde registry code is working under the hood.

If we have 3 pods running for the same application. Each of these pods have lets say server_1 registered in the HordeRegistry.

pod_1 with version_1
pod_2 with version_1 -> running server_1
pod_3 with version_1

If we then trigger an update for version_2, which contains a new service that gets registered in the Horde Registry, server_2. Lets say the update starts with pod_3. When pod_3 comes online with the new version and finds pod_1 and pod_2 .

Does it just pick a process_id from one of the 3 pods to to try and start the new service on ? So you will have a 2 in 3 chance it tries to start the new service on a pod with version_1 and not version_2. ???

Aug 10 '22 20:08 amacciola

I'm not using Horde but the pods with version 1 would not see pods with version 2 in the Erlang cluster, so basically, for each of the islands (one per version), everything would behave like it's a cluster with the same codebase. If you have globally unique jobs it also means that you will have two of the workers started (one per island).

Aug 12 '22 01:08 AndrewDryga

For now i have just created a separate HordeRegistry for each Genserver we would want to be leveraging the Libcluster strategies. As long as we dont have too many its a minor annoyance to fix this issue

Aug 25 '22 14:08 amacciola

libcluster libcluster copied to clipboard

Issue with StatefulSet Rolling Update Strategy

libcluster
libcluster copied to clipboard