nri icon indicating copy to clipboard operation
nri copied to clipboard

Documentation: Best practices for zero-downtime plugin updates

Open grosskur opened this issue 7 months ago • 3 comments

I have an NRI plugin running as a DaemonSet. Ideally, I would like to set spec.updateStrategy.rollingUpdate.maxSurge to a positive number, and spec.updateStrategy.rollingUpdate.maxUnavailable to zero to have "make before break" semantics where on each node the new pod starts up and becomes ready, before the old pod is terminated.

However, for my NRI plugin, it's not safe to have two instances running at the same time acting on the same container, since this would result in an action being performed twice on pod startup rather than once. So for now, I am fully terminating the old pod before starting the new pod to prevent overall. And when the new pod comes up, it has to "catch up" to process any events that were missed.

It would be helpful if we could document any known patterns for achieving this kind of zero-downtime update scenario for an NRI plugin, in a way that avoids duplicate processing.

grosskur avatar May 06 '25 19:05 grosskur

This might actually lead to another open questions that I had in my head for a while about upgrades and overall handling of multiple instances of the same plugins:

  • should plugin on registration phase report its version number?
  • should we force de-register (break connection to) older instance of the plugin once newer version of same plugin successfully registers/synchronizes state?
  • what about downgrades, how handovers from previously registered version should be performed to older version?
  • or should we ignore at all versions of the plugins and only care about timestamps of the registration? e.g. newer instance of the same plugin after successful registration/sync would trigger disconnect to older instance?
  • shall we have in protocol disconnect reasons communicated to plugins, and therefor one of the reasons would be "new instance of you successfully registered" to allow graceful shutdown of older instance?

kad avatar May 07 '25 06:05 kad

I have an NRI plugin running as a DaemonSet. Ideally, I would like to set spec.updateStrategy.rollingUpdate.maxSurge to a positive number, and spec.updateStrategy.rollingUpdate.maxUnavailable to zero to have "make before break" semantics where on each node the new pod starts up and becomes ready, before the old pod is terminated.

However, for my NRI plugin, it's not safe to have two instances running at the same time acting on the same container, since this would result in an action being performed twice on pod startup rather than once. So for now, I am fully terminating the old pod before starting the new pod to prevent overall. And when the new pod comes up, it has to "catch up" to process any events that were missed.

It would be helpful if we could document any known patterns for achieving this kind of zero-downtime update scenario for an NRI plugin, in a way that avoids duplicate processing.

Doesn't your plugin need to handle also potentially crashing, getting restarted and then having to "catch up" to process events it has potentially missed ? If it already does, isn't an update just a controlled restart instance of that same situation ? I know that this might be a stupid question, depending on the nature of things your plugin does and the workloads it customizes, but without knowing more about your use case, it begs to be asked.

Anyway, if you really need to implement something like that, you could do it already now without any help from the NRI infra with something (a bit hackish wrt. index usage) like this. You first update your plugin to 'rubber-stamp' containers it had processed by putting a well-known annotation specific to your plugin on them. You also update your plugin to exit if it sees that annotation on a container it is about to process. Finally (and the ugliest bit of all this), when you roll out an update, you use a one lower index than the running version of your plugin, to ensure that the version you are updating sees the rubber stamp and exits.

There is an additional twist you could have on this to make sure you never end up running out of lower indices no matter how many updates you roll out, but it is only worth mentioning if this looks like something you could consider using...

With all that said, it's a good question how much and what type of support we'd need to put in the infra itself to best support plugin updates. Require some more head scratching...

klihub avatar May 07 '25 14:05 klihub

You also update your plugin to exit if it sees that annotation on a container it is about to process.

I think this part wouldn't be necessary; you'd just skip containers that have the annotation. Once Kubernetes has brought up the new pod it can kill the old one like normal.

If you're rolling out a behavior change, you might also record your own version in the rubber-stamp annotation, and sync to make sure all containers that are still running have your expected version + behavior there.

samuelkarp avatar May 23 '25 21:05 samuelkarp