hono icon indicating copy to clipboard operation
hono copied to clipboard

Investigate slower liveness check request processing in Hono 2.0

Open calohmn opened this issue 3 years ago • 3 comments

Compared with previous Hono versions, HTTP requests on the /liveness endpoint of Hono components seem to take longer on Hono 2.0 components. This means the Kubernetes liveness probes run into a timeout more often than before a Hono 2.0 update.

This was observed with a liveness probe timeout set to 1s on device-registry-mongodb and (to a lesser extent) hono-service-auth components.

In Hono 2.0, liveness requests get handled via the Quarkus SmallRye Health component. This involves some more overhead compared to the plain vert.x HTTP server implementation before. The question is, in how far this is relevant concerning the observed request durations. Example stacktrace of the invocation of the SmallRye Health AsyncHealthCheck handler:

at org.eclipse.hono.service.SmallRyeHealthCheckServer$SmallRyeHealthAdapter.lambda$register$4(SmallRyeHealthCheckServer.java:167)
at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
at io.smallrye.mutiny.operators.uni.builders.UniCreateFromDeferredSupplier.subscribe(UniCreateFromDeferredSupplier.java:25)
at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
at io.smallrye.mutiny.operators.uni.UniOnFailureFlatMap.subscribe(UniOnFailureFlatMap.java:31)
at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
at io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22)
at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
at io.smallrye.mutiny.operators.uni.UniAndCombination$UniHandler.subscribe(UniAndCombination.java:234)
at io.smallrye.mutiny.operators.uni.UniAndCombination$AndSupervisor.run(UniAndCombination.java:88)
at io.smallrye.mutiny.operators.uni.UniAndCombination$AndSupervisor.access$000(UniAndCombination.java:57)
at io.smallrye.mutiny.operators.uni.UniAndCombination.subscribe(UniAndCombination.java:54)
at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:60)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.smallrye.health.SmallRyeHealthReporter.getLiveness(SmallRyeHealthReporter.java:221)
at io.smallrye.health.SmallRyeHealthReporter_ClientProxy.getLiveness(Unknown Source)
at io.quarkus.smallrye.health.runtime.SmallRyeLivenessHandler.getHealth(SmallRyeLivenessHandler.java:11)
at io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.doHandle(SmallRyeHealthHandlerBase.java:44)
at io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.handle(SmallRyeHealthHandlerBase.java:31)
at io.quarkus.smallrye.health.runtime.SmallRyeLivenessHandler.handle(SmallRyeLivenessHandler.java:7)
at io.quarkus.smallrye.health.runtime.SmallRyeHealthHandlerBase.handle(SmallRyeHealthHandlerBase.java:19)
at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:548)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)

Handling switches from a vert.x event-loop-thread to an executor thread here (via Context.executeBlocking).

The device-registry and hono-service-auth components don't even have specific liveness check actions. The protocol adapters and the Command Router use a "EventLoopBlockedCheck". I'm not sure whether that check is really needed.

calohmn avatar Jun 24 '22 13:06 calohmn

In Hono 1.x, using the Hono Helm chart, the liveness/readiness probes were done using HTTPS by default, whereas with Hono 2.0, there was a switch to plain HTTP.

calohmn avatar Jul 13 '22 09:07 calohmn

Shouldn't this make processing the requests faster rather than slower?

sophokles73 avatar Jul 13 '22 10:07 sophokles73

Yes, it makes the observed change in behaviour all the more strange.

calohmn avatar Jul 13 '22 11:07 calohmn