cats-effect icon indicating copy to clipboard operation
cats-effect copied to clipboard

WSTP threads leak

Open DeviLab opened this issue 7 months ago • 4 comments
trafficstars

After switching the WSTP implementation to use LinkedTransferQueue (#4295), cached threads are now polled based on their insertion order — oldest threads are polled first. This change has introduced an issue in high-load applications where threads may never terminate.

In the current implementation, threads wait up to 60 seconds to be picked up (relevant code). However, if the application frequently performs blocking operations, this timeout is unlikely to be reached. Since LinkedTransferQueue maintains strict FIFO ordering, the same long-lived threads continue to be reused, and newer threads never get a chance to time out and exit.

This creates a scenario where a short burst of extra load can lead to a number of spawned threads that never get terminated, leading to thread leaks over time:

Image

Thread dump

"io-compute-blocker-0" #28 [90] daemon prio=5 os_prio=0 cpu=2833814.98ms elapsed=80558.77s tid=0x00007f06350c10d0 nid=90 waiting on condition [0x00007f060603b000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x00000006b2115638> (a java.util.concurrent.LinkedTransferQueue) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:410) at java.util.concurrent.LinkedTransferQueue$DualNode.await([email protected]/LinkedTransferQueue.java:452) at java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:613) at java.util.concurrent.LinkedTransferQueue.tryTransfer([email protected]/LinkedTransferQueue.java:1246) at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:735)

"io-compute-blocker-1" #29 [91] daemon prio=5 os_prio=0 cpu=2832308.36ms elapsed=80558.77s tid=0x00007f06350c2310 nid=91 waiting on condition [0x00007f0605f3a000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x00000006b2115638> (a java.util.concurrent.LinkedTransferQueue) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:410) at java.util.concurrent.LinkedTransferQueue$DualNode.await([email protected]/LinkedTransferQueue.java:452) at java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:613) at java.util.concurrent.LinkedTransferQueue.tryTransfer([email protected]/LinkedTransferQueue.java:1246) at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:735)

"io-compute-blocker-2" #30 [92] daemon prio=5 os_prio=0 cpu=2833613.51ms elapsed=80558.77s tid=0x00007f06350c4bc0 nid=92 waiting on condition [0x00007f0605e39000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x00000006b2115638> (a java.util.concurrent.LinkedTransferQueue) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:410) at java.util.concurrent.LinkedTransferQueue$DualNode.await([email protected]/LinkedTransferQueue.java:452) at java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:613) at java.util.concurrent.LinkedTransferQueue.tryTransfer([email protected]/LinkedTransferQueue.java:1246) at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:735) ....... "io-compute-blocker-1320" #1391 [1600] daemon prio=5 os_prio=0 cpu=50158.77ms elapsed=47910.93s tid=0x00007efb34004210 nid=1600 waiting on condition [0x00007ef57abeb000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x00000006b2115638> (a java.util.concurrent.LinkedTransferQueue) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:410) at java.util.concurrent.LinkedTransferQueue$DualNode.await([email protected]/LinkedTransferQueue.java:452) at java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:613) at java.util.concurrent.LinkedTransferQueue.tryTransfer([email protected]/LinkedTransferQueue.java:1246) at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:735)

DeviLab avatar Apr 14 '25 15:04 DeviLab