cryostat-legacy
cryostat-legacy copied to clipboard
Many vert.x-worker threads blocked during startup
During Cryostat startup discovery, several targets are discovered and cryostat starts observing but after a while, many threads get blocked infinitely and UI is not reachable (or displays only static resources). Not happening configured with empty CRYOSTAT_K8S_NAMESPACES variable.
WARNING: Thread Thread[vert.x-worker-thread-5,5,main] has been blocked for 60664 ms, time limit is 60000 ms
io.vertx.core.VertxException: Thread blocked
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.jav
a:1047)
at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
at app//io.cryostat.recordings.RecordingMetadataManager.lambda$accept$9(RecordingMetadataManager.java:419)
at app//io.cryostat.recordings.RecordingMetadataManager$$Lambda$1112/0x0000000801596628.run(Unknown Source)
at app//io.cryostat.net.web.Vertexecutor.lambda$execute$0(Vertexecutor.java:63)
at app//io.cryostat.net.web.Vertexecutor$$Lambda$953/0x0000000801437618.handle(Unknown Source)
at app//io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
at app//io.vertx.core.impl.ContextBase$$Lambda$957/0x0000000801438000.handle(Unknown Source)
at app//io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
at app//io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
at app//io.vertx.core.impl.ContextBase$$Lambda$955/0x0000000801437a68.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:833)
$ grep 'has been blocked for ' /tmp/cryo.logs | sed 's/blocked for .*//'|sort |uniq WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-0,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-1,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-10,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-11,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-12,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-13,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-14,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-15,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-16,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-17,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-18,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-19,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-2,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-3,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-4,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-5,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-6,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-7,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-8,5,main] has been WARNING: Thread Thread[vert.x-worker-thread-9,5,main] has been
Environment:
K8S, cryostat-operator, cryostat with multi-namespace configuration - Affects v2.0.0-SNAPSHOT-832-g741dbbef but didn't appear with v2.0.0-SNAPSHOT-772-g1672b235
I have a hunch that this is related to or caused by #1388 . That PR didn't completely solve what it set out to (so #1402 has been in a draft trying to finish the job). I think it's probably worthwhile backing out the riskier parts of #1388 for the 2.3.0 release and putting the entire fix in #1402 for merge later, if it can be done.
@miratx how many target applications do you have across the K8S_NAMESPACES
that your Cryostat instance is monitoring? I suspect that if there are ~20+ (the size of the worker thread pool) that's when this problem would manifest. If so then I can try to reproduce this, and then use that to test #1449 as a solution.
@andrewazores good point, it worked now OK for 17 targets, over 20 targets caused problem. Thanks!
Thanks very much for confirming @miratx . I think my PR #1449 will fix this report then.
Possibly related #1669