kubeclient icon indicating copy to clipboard operation
kubeclient copied to clipboard

How to detect 410 Gone in watch response?

Open Ghazgkull opened this issue 5 years ago • 5 comments

https://github.com/abonas/kubeclient#starting-watch-version

In the documentation for the watch API, there's a great callout that clients need to handle 410 Gone errors. But from looking at the documentation, it's not clear how to detect this case. Can a little clarification please be added to the doc to explain how to properly introspect the notice to find the status code?

e.g. Should we expect to see notice['object']['status'] == 410 or something?


I ask this because there's currently a nasty problem in the fluentd "kubernetes metadata" plugin which is a client of this library, where this error is not being handled properly and I'm trying to help resolve the issue. (See: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/226)

Ghazgkull avatar Jul 09 '20 17:07 Ghazgkull

You're right that watch does not return actual HTTP error codes, but rather passes a value with error message into the block:

pry> kclient.watch_pods(resource_version: "123", as: :parsed) {|n| pp n}
{"type"=>"ERROR",
 "object"=>
  {"kind"=>"Status",
   "apiVersion"=>"v1",
   "metadata"=>{},
   "status"=>"Failure",
   "message"=>"too old resource version: 123 (391079)",
   "reason"=>"Gone",
   "code"=>410}}

This is unfortunately deliberate in k8s (https://github.com/kubernetes/kubernetes/issues/25151, https://github.com/kubernetes/kubernetes/issues/35068#issuecomment-261320887). Oh well.

LOL, in https://github.com/abonas/kubeclient/pull/436 I told myself it's OK not to document how to detect 410 as I intend to fix it "ASAP" :rofl:

See also this brain dump / discussion on improving this in kubeclient: https://github.com/abonas/kubeclient/pull/275#issuecomment-591928097. I still haven't implemented the changes planned there (PRs welcome!)

cben avatar Jul 10 '20 16:07 cben

See also previous discussion in your project https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/214

cben avatar Jul 10 '20 16:07 cben

Thanks for the info. Sounds like the way to check is notice['object']['code'] == 410. I'll suggest that in the fluentd kubernetes metadata plugin issue.

For the record, I'm not a contributor on that project. I'm just some poor shmuck out here running the elasticsearch-fluentd helm chart in my cluster, seeing the fluentd pod periodically blowing up because of the lack of error handling in that plugin. :-)

Ghazgkull avatar Jul 10 '20 17:07 Ghazgkull

@cben I got ambitious and used your advice to PR a contribution to the fluentd plugin. https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/243

Thanks again.

Ghazgkull avatar Jul 10 '20 19:07 Ghazgkull

BTW, reason may not always be "Gone". According to this, recent k8s are switching to "Expired": https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/test/e2e/apimachinery/protocol.go#L68-L75 But code is 410 for both. (In general, it's better not to depend on reason fields when you have a choice)

cben avatar Aug 26 '20 05:08 cben