java-operator-sdk icon indicating copy to clipboard operation
java-operator-sdk copied to clipboard

Generic Controller to Detect if Pod was Evicted by Node Upgrade

Open csviri opened this issue 1 year ago • 6 comments

During a node upgrade, pods get drained from the node. For long-running applications where frequent restarts are not desirable, it would be useful to get information about the reason for pod eviction, especially if it was because of this node upgrade.

This could be solved with a generic controller that watches pods, and nodes, and in case of pod eviction checks if the node is being drained and sends a notification to a listener interface about the pod.

csviri avatar Sep 20 '24 07:09 csviri

Perhaps this should be a separate project, though?

metacosm avatar Sep 25 '24 18:09 metacosm

Maybe just a separate module, within this project. Since it's called SDK, at least in my mind tool/libs for common subproblems fits. What do you think?

csviri avatar Sep 26 '24 08:09 csviri

I get the point but adding "random" utilities to the SDK project dilutes the SDK itself, in my opinion, though we could make it an example operator that would also be actually useful…

metacosm avatar Sep 26 '24 08:09 metacosm

The thing is that the notification system might vary, based how the platform handles such events in a specific company, som might use kubernetes events others kafka messages to get these specific notifications.

csviri avatar Sep 26 '24 08:09 csviri

Then it makes even less sense to be part of the SDK if we cannot have a solution that works generically. Or am I missing something?

metacosm avatar Sep 26 '24 13:09 metacosm

Usually it works like this, companies have internal forks and internal builds of such open source projects (at least in my experience from multiple companies), where these extension points are used to fulfill internal requirements. See for example resource listener In Flink Operator: https://github.com/apache/flink-kubernetes-operator/blob/d946f3f9f3a7f12098cd82db2545de7c89e220ff/flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/listener/FlinkResourceListener.java#L36

The open source project actually does not provide any implementation (only for tests), but anyone in their internal fork can provide one.

csviri avatar Sep 26 '24 13:09 csviri

I don't plan to work for now, will close and in case reopen later.

csviri avatar Aug 01 '25 08:08 csviri