hono
hono copied to clipboard
Adapter instance status service reports healthy adapter as SUSPECTED_DEAD
Hi,
I stumbled on a strange issue where commands didn't get through to the MQTT adapter. Environment: Hono 1.10.0 deployed with Helm.
For context: we delete the MQTT adapter every month with a k8s cronjob because of the external, public-facing Let's Encrypt certificate that expires every 3 months and is replaced regularly by cert-manager.
Here the pod had been recreated 16 days ago with no container restarts since, and works perfectly.

The subscription to command///req/# works without issues on the adapter:

However the routing of the command (sent maybe 10 seconds later in my test) fails because the container is suspected dead:

Note how the found adapter instance id reflects the name of the pod (hono-adapter-mqtt-vertx-57cf66fb95-cpbp6) since the last refresh, so it can't be a former pod.
Restarting the command router service seems to solve the problem (I suppose setting
hono.commandRouter.svc.kubernetesBasedAdapterInstanceStatusServiceEnabled to false would be another workaround)
but I'm wondering how an adapter instance that has never been terminated can get into SUSPECTED_DEAD status, and why it didn't get out of it?
I suspect it reached this:
https://github.com/eclipse/hono/blob/db5c71cc0669f2a1be9e09a094b857aa3b1894da/services/command-router/src/main/java/org/eclipse/hono/commandrouter/impl/KubernetesBasedAdapterInstanceStatusService.java#L338-L344
Thanks in advance for any insight.
It seems the scenario here is similar to the one described in #3225.
The Hono Command Router component is making requests to the Kubernetes API to check which Hono protocol adapter pods exist in the cluster. The scenario observed in #3225 was that right after a new protocol adapter pod was started and when devices had already initiated command message subscriptions with the new protocol adapter, the Kubernetes API still didn't include the new pod in its "get pods" result and the Command Router also didn't receive watch events via the API for the creation of the pod yet. In this scenario there was a cluster update happening at the same time, so this could have been the reason for the delays concerning the K8s API requests.
The scope of #3225 was a bit different than the scenario in this issue here. It was about fixing a "too early" progression from state SUSPECTED_DEAD to DEAD, so the fix for #3225 doesn't affect the issue here.
@ghys You wrote that the situation of having the adapter marked with state SUSPECTED_DEAD didn't resolve itself. For how long did you keep the new protocol adapter pod running? Do you happen to still have log output of the Command Router component? Interesting would be entries with KubernetesBasedAdapterInstanceStatusService in them.
@ghys You wrote that the situation of having the adapter marked with state SUSPECTED_DEAD didn't resolve itself. For how long did you keep the new protocol adapter pod running? Do you happen to still have log output of the Command Router component? Interesting would be entries with KubernetesBasedAdapterInstanceStatusService in them.
The MQTT adapter pod was recreated on July 26th and has been running since. It's unclear for how long exactly the commands were not routed. Unfortunately I don't have the logs of the previous Command Router pod (the one that was terminated and recreated yesterday).
On the current one I do see watchers being recreated several times but only during a 20-minute period this morning:
Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25611867 (25699032)).
But I'm not sure it is related (found https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version about this error); and it resolved itself after these 20 minutes anyway:
______ _ _ _ _
| ____| | (_) | | | |
| |__ ___| |_ _ __ ___ ___ | |__| | ___ _ __ ___
| __| / __| | | '_ \/ __|/ _ \ | __ |/ _ \| '_ \ / _ \
| |___| (__| | | |_) \__ \ __/ | | | | (_) | | | | (_) |
|______\___|_|_| .__/|___/\___| |_| |_|\___/|_| |_|\___/
| |
|_|
Eclipse Hono Command Router
Go to https://www.eclipse.org/hono for more information.
Powered by Quarkus 2.2.2.Final
2022-08-11 15:46:45,529 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (main) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 1
2022-08-11 15:46:45,618 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (main) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 2
2022-08-11 15:46:45,619 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (main) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 3
...
2022-08-12 08:01:09,268 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:01:11,029 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:01:15,948 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25611867 (25699032)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25611867 (25699032)
... 13 more
2022-08-12 08:01:16,050 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:01:16,196 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 1
2022-08-12 08:01:16,196 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 2
2022-08-12 08:01:16,197 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 3
2022-08-12 08:01:16,232 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:05:54,049 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:05:55,054 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:05:57,105 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25699095 (25699370)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25699095 (25699370)
... 13 more
2022-08-12 08:05:57,207 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:05:57,247 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 1
2022-08-12 08:05:57,248 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 2
2022-08-12 08:05:57,248 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 3
2022-08-12 08:05:57,265 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:08:14,435 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:08:15,438 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:08:17,546 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25699438 (25699543)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25699438 (25699543)
... 13 more
2022-08-12 08:08:17,647 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:08:17,690 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 1
2022-08-12 08:08:17,690 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 2
2022-08-12 08:08:17,690 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 3
2022-08-12 08:08:17,712 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:12:45,500 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:12:46,504 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:12:48,524 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25699599 (25699880)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25699599 (25699880)
... 13 more
2022-08-12 08:12:48,625 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:12:48,649 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 1
2022-08-12 08:12:48,649 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 2
2022-08-12 08:12:48,649 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 3
2022-08-12 08:12:48,661 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:15:06,601 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:15:07,606 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:15:09,755 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25699967 (25700094)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25699967 (25700094)
... 13 more
2022-08-12 08:15:09,857 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:15:09,898 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 1
2022-08-12 08:15:09,899 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 2
2022-08-12 08:15:09,899 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 3
2022-08-12 08:15:09,920 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:17:29,760 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:17:30,763 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:17:32,791 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25700157 (25700261)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25700157 (25700261)
... 13 more
2022-08-12 08:17:32,892 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:17:32,949 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 1
2022-08-12 08:17:32,949 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 2
2022-08-12 08:17:32,950 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 3
2022-08-12 08:17:32,968 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
2022-08-12 08:20:02,856 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.io.EOFException null
2022-08-12 08:20:03,861 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.xxx.yyy.zzz/...) Exec Failure java.net.ConnectException Failed to connect to /10.xxx.yyy.zzz:443
2022-08-12 08:20:06,175 ERROR [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Watcher closed with error: io.fabric8.kubernetes.client.WatcherException: too old resource version: 25700444 (25700450)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:265)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:249)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:93)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 25700444 (25700450)
... 13 more
2022-08-12 08:20:06,277 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) Recreating watch
2022-08-12 08:20:06,312 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-mqtt-vertx-57cf66fb95-cpbp6], container [801fdb544a99]; active adapter containers now: 1
2022-08-12 08:20:06,313 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-http-vertx-5755f8d78d-xdglq], container [61e5901b7c56]; active adapter containers now: 2
2022-08-12 08:20:06,313 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) added entry for pod [hono-adapter-amqp-vertx-78759f9f57-5m8qk], container [ad36b05d113c]; active adapter containers now: 3
2022-08-12 08:20:06,326 INFO [org.ecl.hon.com.imp.KubernetesBasedAdapterInstanceStatusService] (OkHttp https://10.xxx.yyy.zzz/...) initialized list of active adapter containers: {801fdb544a99=hono-adapter-mqtt-vertx-57cf66fb95-cpbp6, 61e5901b7c56=hono-adapter-http-vertx-5755f8d78d-xdglq, ad36b05d113c=hono-adapter-amqp-vertx-78759f9f57-5m8qk}
If the problem of the adapter being SUSPECTED_DEAD occurs again then I'll be sure to update with what I see in the logs, especially after that monthly MQTT adapter restart. Thanks for your support!
The WatcherExceptions above point at some temporary instabilities concerning the K8s API server, I guess. While the watcher instance is stopped during that time, there shouldn't be any issues with failed command deliveries (state is reported as UNKNOWN and ignored then).
If the watcher is connected, but there are K8s API request responses coming in with really big delay (as described in #3225), I could imagine issues regarding the SUSPECTED_DEAD state.
I've created #3384 to improve the handling in that case (also fixing potential synchronization issues - not relevant for Hono 1.10).
If the problem of the adapter being SUSPECTED_DEAD occurs again then I'll be sure to update with what I see in the logs, especially after that monthly MQTT adapter restart. Thanks for your support!
Ok, good. Logs would be helpful for sure.
The upcoming 2.0.2 and 2.1.0 releases will contain improvements regarding the handling of delayed K8s API server responses. I'm not sure whether in the scenario described in this issue there was maybe another problem still. So, we could keep this issue open for now, seeing whether the issue occurs again.
Thanks for the feedback @calohmn - we haven't made an impact analysis for upgrading to 2.x and aren't probably ready for it yet but this is certainly valued.