microbean-kubernetes-controller
microbean-kubernetes-controller copied to clipboard
Reflector#start() does not retry infinitely
I am not entirely sure but I believe that the relevant Go code ends up retrying forever:
https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L128-L137
But Reflector#start() will bomb out if it can't list things:
https://github.com/microbean/microbean-kubernetes-controller/blob/ee05d9ece9f591bc7052a2569a6422e804be42e8/src/main/java/org/microbean/kubernetes/controller/Reflector.java#L625
That line will throw a KubernetesClientException, and that's it. I think this whole method should (internally) be trying this list-and-watch loop forever.
For more details, see also the ListAndWatch function:
https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L165-L275
Another way to put this is that currently Reflector#start() really just models ListAndWatch, but it should be modeling Run in reflector.go as well.
Complicating matters is the fact that the fabric8 client will automatically try to reconnect watches if they fail.
I believe this code:
https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L226-L274
…is actually implemented by the fabric8 client, more or less:
https://github.com/fabric8io/kubernetes-client/blob/cee8550053e671bb0b3489e37196924f4c949a3f/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L293-L333
…so the problem may simply reduce to the body of the start() method needing to run infinitely, i.e. to ensure that the listing behavior is also retried ad nauseam.
Ah, wait; this is already handled:
https://github.com/microbean/microbean-kubernetes-controller/blob/ee05d9ece9f591bc7052a2569a6422e804be42e8/src/main/java/org/microbean/kubernetes/controller/Reflector.java#L1031-L1057
There could be better logging here. I'll add logging and then close this issue.
Need to also review the Go code to see if the "full replace" operation occurs in the event of a failure-and-retry occurrence.