How to detect 410 Gone in watch response?
https://github.com/abonas/kubeclient#starting-watch-version
In the documentation for the watch API, there's a great callout that clients need to handle 410 Gone errors. But from looking at the documentation, it's not clear how to detect this case. Can a little clarification please be added to the doc to explain how to properly introspect the notice to find the status code?
e.g. Should we expect to see notice['object']['status'] == 410 or something?
I ask this because there's currently a nasty problem in the fluentd "kubernetes metadata" plugin which is a client of this library, where this error is not being handled properly and I'm trying to help resolve the issue. (See: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/226)
You're right that watch does not return actual HTTP error codes, but rather passes a value with error message into the block:
pry> kclient.watch_pods(resource_version: "123", as: :parsed) {|n| pp n}
{"type"=>"ERROR",
"object"=>
{"kind"=>"Status",
"apiVersion"=>"v1",
"metadata"=>{},
"status"=>"Failure",
"message"=>"too old resource version: 123 (391079)",
"reason"=>"Gone",
"code"=>410}}
This is unfortunately deliberate in k8s (https://github.com/kubernetes/kubernetes/issues/25151, https://github.com/kubernetes/kubernetes/issues/35068#issuecomment-261320887). Oh well.
LOL, in https://github.com/abonas/kubeclient/pull/436 I told myself it's OK not to document how to detect 410 as I intend to fix it "ASAP" :rofl:
See also this brain dump / discussion on improving this in kubeclient: https://github.com/abonas/kubeclient/pull/275#issuecomment-591928097. I still haven't implemented the changes planned there (PRs welcome!)
See also previous discussion in your project https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/214
Thanks for the info. Sounds like the way to check is notice['object']['code'] == 410. I'll suggest that in the fluentd kubernetes metadata plugin issue.
For the record, I'm not a contributor on that project. I'm just some poor shmuck out here running the elasticsearch-fluentd helm chart in my cluster, seeing the fluentd pod periodically blowing up because of the lack of error handling in that plugin. :-)
@cben I got ambitious and used your advice to PR a contribution to the fluentd plugin. https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/243
Thanks again.
BTW, reason may not always be "Gone".
According to this, recent k8s are switching to "Expired":
https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75
But code is 410 for both.
(In general, it's better not to depend on reason fields when you have a choice)