Manage resourceVersion to allow resilient restart of watch method
What is the feature and why do you need it:
We are using the method stream of object Watch https://github.com/kubernetes-client/python/blob/94e42113a1fe5c580917decacdde879eab7406b3/kubernetes/base/watch/watch.py#L129.
Let say i use this method with v1.list_namespace with no timeout specified (https://github.com/kubernetes-client/python/blob/master/examples/watch/timeout-settings.md) then we see that:
- with no resourceVersion and timeout specified, the stream atfirst list all namespaces as 'ADDED' event the namespace are alphatically ordered.
- then the stream wait for event with a self.resource_version probably quite old
- we then hit a server timeout default kubernetes specified in the previous link between 30min and 1h then the watch and hit a 410.
- we need then to restart the stream
If during the 30min 1 hour period a namespace is created then the watch store a more recent resourceVersion and then the 410 is reached quite further in the time (probably depending of the history or activity on the cluster).
Describe the solution you'd like to see:
From our test the good resourceVersion to plan a restart is not the resourceVersion of the last event seen but the resourceVersion available in the metadata in the func argument of the stream metdhod func .
In the response there is a metadata.resourceVersion given by kubernetes that allow to restart the stream from this resourceVersion that generate no error.
Not sure if this metadata is available on all func method.
It's quite hard to understand how to use the watch method in the api if we want to maintain a daemon program with no error. With a no resourceVersion and no timeout specified everyone should now that there is this kind of problem due to the self.resourceVersion storage
/help
@roycaihw: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.