anycable-helm icon indicating copy to clipboard operation
anycable-helm copied to clipboard

Recommendations around disruption?

Open sdhull opened this issue 11 months ago • 6 comments

Every time a pod is disrupted (rescheduled, new deploy), clients must detect that the old pods are dead and connect to the new pods (right?). This creates the potential for missed messages. What is the recommendation to mitigate this?

sdhull avatar Jan 27 '25 22:01 sdhull

Check out https://docs.anycable.io/anycable-go/reliable_streams

clients must detect that the old pods are dead and connect to the new pods (right?).

Yeah, only if you re-deploy anycable pods; you don't need to do that when you re-deploy your application pods.

palkan avatar Jan 27 '25 23:01 palkan

Yeah, only if you re-deploy anycable pods; you don't need to do that when you re-deploy your application pods.

Right. But we're using Karpenter to optimize resource allocation, and apparently with our configuration, the anycable-go pods are being shuffled to other nodes fairly frequently (5-10 times daily).

I'll look into the reliable streams docs (sounds very promising!) but will require some additional config & testing before we can roll it out. In the meantime, should we be attempting to configure k8s/karpenter to minimize anycable-go disruptions?

sdhull avatar Jan 28 '25 00:01 sdhull

Yeah, it's better to keep anycable pods on for longer times to avoid disconnections

palkan avatar Jan 28 '25 01:01 palkan

Super dumb question on this @palkan , but if we enable HPA, is it like a leader election sort of deal? Or will, say, 3 pods HPA configuration broadcast events 3 times?

huynhj93 avatar Jan 28 '25 22:01 huynhj93

@huynhj93 Clients (eg, browsers) will connect to one of the anycable-go pods and subscribe to one or more channels. When a message is broadcast to a channel, each pod will find the clients subscribed to that channel and send each of the clients the message. A client is only connected to one of the pods at a time so clients will only get the message (at most) once.

The issue is if the client thinks it is connected but actually the pod has been torn down because it's being rescheduled. Then a message is sent on a channel the client is subscribed to, but the client will never get the message because it hasn't detected its connection is bad yet (in between heartbeats) so it hasn't reconnected to the new/other pod

sdhull avatar Jan 28 '25 23:01 sdhull

3 pods HPA configuration broadcast events 3 times?

We have a pub/sub component responsible for distributing messages within the cluster. When you perform a broadcast from the application, you either directly send a message to AnyCable pub/sub (if you use Redis or Nats broadcast adapter) or the messages is processed by some node and then re-distributed within the cluster by AnyCable itself (if you use HTTP or Redis X broadcast adatper). The diagram shows the latter case.

palkan avatar Jan 29 '25 19:01 palkan