libcluster
libcluster copied to clipboard
Issue with StatefulSet Rolling Update Strategy
Precursor:
Currently all of our applications are deployed with StatefulSets vs being deployed with Deployments . The current UpdateStrategy of our StatefulSets is Rolling Updates. Here is an explanation of what it does and the other option we have:
Issue:
The combination of Rolling Updates && Libcluster is making it so that we can never add new services to the libcluster/horde registry. Because we will have
- 3 pods running all lets say with version 1
- Version 1 has libcluster/horde running but only Genserver_A is registered to it
- Then we trigger an update to this env with version 2
- In version 2 we have added a Genserver_B to be added to the libcluster/horde registry
- The Rolling Update will start with pod 2 out of 0,1,2. And it will not update the other pods with the new version until pod 2 is up and running
- However when pod 2 starts the libcluster detects pods 0 and 1 using the k8s labels and IPs and tries to register Genserver_B
- But pods 0 and 1 do not have the code for Genserver_B yet. So pod 2 crashes because it cannot start Genserver_B on the pod its trying to.
- And the Rolling update never proceeds to the other pods because pod 2 never passes
Or at least that is what i think is happening here. For the most part i think i have the issue correct and the error message on the pod that is crashing is
** (EXIT) an exception was raised:
** (UndefinedFunctionError) function Cogynt.Servers.Workers.CustomFields.start_link/1 is undefined or private
(cogynt 0.1.0) Cogynt.Servers.Workers.CustomFields.start_link([name: {:via, Horde.Registry, {Cogynt.Horde.HordeRegistry, Cogynt.Servers.Workers.CustomFields}}])
(horde 0.8.7) lib/horde/processes_supervisor.ex:766: Horde.ProcessesSupervisor.start_child/3
(horde 0.8.7) lib/horde/processes_supervisor.ex:752: Horde.ProcessesSupervisor.handle_start_child/2
(stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
(stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
(stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Even though i know the version on that pod has the code for Cogynt.Servers.Workers.CustomFields.start_link
so it must be referring to one of the other 2 pods that had not got the new version yet.
Has anyone else every ran into this problem ?
I think you would have the same issue even if you use deployment, new pods would crash-loop anyways.
The way how we solved this is: we have our own implementation of the Kubernetes strategy that polls k8s API and only joins nodes of the same version (from the version label) into a cluster. This makes it impossible to handoff state between application versions but makes sure that code that was never tested to co-live in a cluster would end up crashing in production.
@AndrewDryga do you think this custom k8s strategy is worth a PR or can be shared ? Because i feel like how are more people not running into this same issue ? Is everyone else using this library only every deploying libcluster 1 time and then never adding new features into its registry from that point forward ?
@amacciola the problem with our strategy is that it is very opinionated (uses specific labels named for our environment, uses node names from k8s labels, etc). I will think about open-sourcing it but it's a relatively easy change: just leverage labelSelector
in get_nodes/4
callback implementation and query only the pods of a specific version.
@AndrewDryga okay i will try this. So if i am trying to extend the k8s dns strategy here: https://github.com/bitwalker/libcluster/blob/main/lib/strategy/kubernetes_dns.ex
you are suggestion that i need to tweak the get_nodes
method here:
https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes_dns.ex#L107
to also additionally query for a specific version or at least only matching version numbers ?
@amacciola you can't extract that information from DNS server, instead you should modify that function in lib/strategy/kubernetes.ex. K8s API returns a lot of information about the pod including labels that you need to use to store the version.
@AndrewDryga i see. So its changing
https://github.com/bitwalker/libcluster/blob/5240d23be9573bad51195a737a234a5b9e9eec28/lib/strategy/kubernetes.ex#L232-L252
so include additional params to only return info for certain version info
@amacciola yes, you want to query for pods and return only the ones that match your current version
@AndrewDryga i am working on testing this new strategy out now so thanks for the insight. But i just wanted to make sure i understood how some of the Libcluster combined with Horde registry code is working under the hood.
If we have 3 pods running for the same application. Each of these pods have lets say server_1
registered in the HordeRegistry.
- pod_1 with version_1
- pod_2 with version_1 -> running server_1
- pod_3 with version_1
If we then trigger an update for version_2
, which contains a new service that gets registered in the Horde Registry, server_2
. Lets say the update starts with pod_3
. When pod_3
comes online with the new version and finds pod_1
and pod_2
.
Does it just pick a process_id from one of the 3 pods to to try and start the new service on ? So you will have a 2 in 3 chance it tries to start the new service on a pod with version_1
and not version_2
. ???
I'm not using Horde but the pods with version 1 would not see pods with version 2 in the Erlang cluster, so basically, for each of the islands (one per version), everything would behave like it's a cluster with the same codebase. If you have globally unique jobs it also means that you will have two of the workers started (one per island).
For now i have just created a separate HordeRegistry for each Genserver we would want to be leveraging the Libcluster strategies. As long as we dont have too many its a minor annoyance to fix this issue