jetty.project
jetty.project copied to clipboard
Deadlock in Jetty using Logback
Jetty version(s) Jetty 10.0.11 Logback 1.2.11
Java version/vendor openjdk version "17.0.3" 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode)
OS type/version
Initially reproduced in GitHub Actions on the ubuntu-latest
runner
Also reproduced locally on OSX 12.2
Description
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy
uses internal AutoLock
instance to synchronize access to mutable state, this lock is acquired on each toString()
call. Logback library internals contains synchronized method ch.qos.logback.core.AppenderBase.doAppend()
which is invoked for every logging event.
The following situation causes deadlock and full application outage due to blocked logging:
Thread 1:
// lock AppenderBase.doAppend() and then awaiting for releasing AdaptiveExecutionStrategy.toString()
AdaptiveExecutionStrategy.tryProduce() {
if (LOG.isDebugEnabled()) {
LOG.debug("{} tryProduce {}", this, wasPending);
}
}
Thread 2:
// lock AdaptiveExecutionStrategy and then awaiting for releasing AppenderBase.doAppend()
AdaptiveExecutionStrategy.selectSubStrategy() {
try (AutoLock l = _lock.lock()) {
LOG.info("hello");
}
}
Thread dump with the issue:
"embedded-jetty-177" #177 prio=5 os_prio=31 cpu=4.44ms elapsed=391.19s tid=0x000000011a9fe800 nid=0x1490b waiting on condition [0x000000028b73d000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:938)
at java.util.concurrent.locks.ReentrantLock$Sync.lock([email protected]/ReentrantLock.java:153)
at java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:322)
at org.eclipse.jetty.util.thread.AutoLock.lock(AutoLock.java:44)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.toString(AdaptiveExecutionStrategy.java:524)
at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:277)
at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:249)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:211)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:161)
at ch.qos.logback.classic.spi.LoggingEvent.getFormattedMessage(LoggingEvent.java:293)
at com.workato.agent.log.CloudAppender.postProcessEvent(CloudAppender.java:97)
at com.workato.agent.log.CloudAppender.postProcessEvent(CloudAppender.java:39)
at com.workato.agent.log.CircularBufferAppender.append(CircularBufferAppender.java:125)
at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:82)
- locked <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
at ch.qos.logback.classic.Logger.debug(Logger.java:490)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:201)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$616/0x00000008005191f0.run(Unknown Source)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
at java.lang.Thread.run([email protected]/Thread.java:833)
"embedded-jetty-178" #178 prio=5 os_prio=31 cpu=0.86ms elapsed=391.19s tid=0x000000012a640000 nid=0xb607 waiting for monitor entry [0x000000028bb55000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:63)
- waiting to lock <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
at ch.qos.logback.classic.Logger.debug(Logger.java:490)
at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:724)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor.startReservedThread(ReservedThreadExecutor.java:257)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor.tryExecute(ReservedThreadExecutor.java:232)
at org.eclipse.jetty.util.thread.QueuedThreadPool.tryExecute(QueuedThreadPool.java:735)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.selectSubStrategy(AdaptiveExecutionStrategy.java:331)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$616/0x00000008005191f0.run(Unknown Source)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
at java.lang.Thread.run([email protected]/Thread.java:833)
Locked ownable synchronizers:
- <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
Also reproduced on Jetty 10.0.10, but cannot reproduce on 10.0.9
Is there a reason you are using an old logback? This is a known issue in Logback 1.2.x series.
- See https://jira.qos.ch/browse/LOGBACK-1492
For Jetty 10 and Jetty 11 we use Logback 1.3.0 series.
Also reproduced on Jetty 10.0.10, but cannot reproduce on 10.0.9
The class named AdaptiveExecutionStrategy
came into being for release 10.0.6.
It had an update in release 10.0.8 for javadoc, no change in logic.
https://github.com/eclipse/jetty.project/commits/jetty-10.0.x/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/AdaptiveExecutionStrategy.java
There should be the same behavior between 10.0.10 and 10.0.9.
@joakime Logback 1.2.* is not abandoned and in fact is being developing in parallel with 1.3.* Migration to the latest version is not an option right now due to dependencies issues (e.g. 1.3.* has the Jakarta API in transitive dependencies).
As for the deadlock issue, using an exclusive lock in the toString()
implementation and leaking this
reference into uncontrolled code leads to dangerous situation: toString()
can be called later in any thread under any locks.
As a simple fix this
reference can be replaced with explicit toString()
call:
LOG.debug("{} tryProduce {}", this, wasPending)
-> LOG.debug("{} tryProduce {}", this.toString(), wasPending)
If you find these argument having sense I can prepare the PR with the fixes.
Lets look at the history of this lock in toString()
.
As if this was a concern, it would happen for far more of a range then just 10.0.10 thru 10.0.9.
It's been present in every release of Jetty 9.4.4 onwards for the past 7 years (even includes all releases of Jetty 10 and Jetty 11).
July 29, 2022 (Today)
Last seen in Jetty 10.0.11 First seen in Jetty 10.0.6
The current AdaptiveExecutionStrategy.toString()
https://github.com/eclipse/jetty.project/blob/4a6c2744a0d4ff404daee0a521e88ae386356e06/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/AdaptiveExecutionStrategy.java#L522-L528
June 16, 2021
Last seen in Jetty 10.0.5 First seen in Jetty 10.0.0
https://github.com/eclipse/jetty.project/blob/a492473a73fee96680baf56aa9b3223d83502f0c/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L436-L442
This is what it looked like before the class was renamed in Jetty 10.0.6.
The rename of EatWhatYouKill
to AdaptiveExecutionStrategy
occurred in commit a415606e01b9cb80d8cfa3c3d66d41afb6c57ae5
July 27, 2020
Last seen in Jetty 10.0.0 First seen in Jetty 10.0.0
https://github.com/eclipse/jetty.project/blob/8d69fc41a7f8c26fcd90fc0f7300f81eaede06be/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L438-L441
August 8, 2017
Last seen in Jetty 10.0.0-alpha First seen in Jetty 9.4.7
https://github.com/eclipse/jetty.project/blob/a105be95e44fae91fcf8b85b5d6d0f78fd143a04/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L266-L272
This was a change in commit a105be95e44fae91fcf8b85b5d6d0f78fd143a04
March 31, 2017
Last seen in Jetty 9.4.6 First seen in Jetty 9.4.4
https://github.com/eclipse/jetty.project/blob/f06c7c358f23674d33b629b3d5f402042ffd4656/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L336-L345
When the class was first created in commit f06c7c358f23674d33b629b3d5f402042ffd4656
@joakime Logback 1.2.* is not abandoned and in fact is being developing in parallel with 1.3.* Migration to the latest version is not an option right now due to dependencies issues (e.g. 1.3.* has the Jakarta API in transitive dependencies).
That jakarta dependency might look scary at first, but follow the trail ...
- https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.3.0-alpha16/logback-core-1.3.0-alpha16.pom
This references jakarta.servlet:jakarta-servlet-api:<ver-from-parent>
declared as <optional>true</optional>
so it won't pull it in without you specifically declaring it.
- https://repo1.maven.org/maven2/ch/qos/logback/logback-parent/1.3.0-alpha16/logback-parent-1.3.0-alpha16.pom
This references jakarta.servlet:jakarta-servlet-api:5.0.0
Servlet 5.0 is part of the Jakarta EE9 effort which had namespace changes from javax.servlet
to jakarta.servlet
(that happened in Jakarta EE9 as part of the "jakarta ee big bang"). Servlet 5.0 is compatible with Jetty 11, but not Jetty 10, which you are using now.
The logback-core has no actual direct usage of servlet.
The existence of those dependencies is to satisfy the jpms requirements for logback-classic and logback-access.
logback 1.3.0 changed to using jakarta.servlet
in commit https://github.com/qos-ch/logback/commit/2ff2d4ef398b5c6863f43b8dc8e1d741c6d840d3 for release 1.3.0-alpha11 (and was motivated by the changes in Spring 6.0)
But servlet is still optional for logback, not mandatory. If you don't use the servlet features in logback-classic or logback-access then use logback without worry.
If you use logback-access, then note that logback-access has other issues, so don't use it (any version of logback-access) until they fix the open bugs with it.
logback-access bugs:
- https://github.com/qos-ch/logback/pull/532
- LOGBACK-1580
- LOGBACK-1581
Jetty bugs about logback-access:
- https://github.com/eclipse/jetty.project/issues/8148
- https://github.com/eclipse/jetty.project/issues/7617
- https://github.com/eclipse/jetty.project/issues/7712
We are heavy users of logback ourselves, in fact we prefer it over log4j2 and alternatives, and have not experienced this deadlock issue ourselves, even during extreme load testing (think saturated multi gigabit network interfaces during testing). A deadlock would definitely be noticed. We use logback 1.2 in Jetty 9.x (which is now at End of Community Support) and logback 1.3+ for Jetty 10/11/12 in our own testing.
Have you considered that this deadlock could be due to a specific configuration quirk on your logback setup/configuration? This has happened before, see LOGBACK-1422 (it shows a few different examples of configurations that can trigger it, and even some workarounds/fixes)
As a simple fix
this
reference can be replaced with explicittoString()
call:LOG.debug("{} tryProduce {}", this, wasPending)
->LOG.debug("{} tryProduce {}", this.toString(), wasPending)
If you find these argument having sense I can prepare the PR with the fixes.
This is a simple enough change, I can make it. But I have doubts it will change anything. As that specific line is not ever within a Lock.
Ideally, a testcase that can replicate this is an important first step.
If it is an issue, we should be able to replicate it without the need for logback deps.
Using the jetty-slf4j-impl
, you can instantiate a new JettyLogger
with a test/custom Appender that does the locking in the same way as logback.
Then attempt to cause the deadlock in the identified code.
Then, finally, a PR to fix it can follow.
When we break down your threaddump, we see ...
"embedded-jetty-177" #177 prio=5 os_prio=31 cpu=4.44ms elapsed=391.19s tid=0x000000011a9fe800 nid=0x1490b waiting on condition [0x000000028b73d000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
...
at org.eclipse.jetty.util.thread.AutoLock.lock(AutoLock.java:44)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.toString(AdaptiveExecutionStrategy.java:524)
...
at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:82)
- locked <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
"embedded-jetty-178" #178 prio=5 os_prio=31 cpu=0.86ms elapsed=391.19s tid=0x000000012a640000 nid=0xb607 waiting for monitor entry [0x000000028bb55000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:63)
- waiting to lock <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
...
at ch.qos.logback.classic.Logger.debug(Logger.java:490)
at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:724)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor.startReservedThread(ReservedThreadExecutor.java:257)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor.tryExecute(ReservedThreadExecutor.java:232)
at org.eclipse.jetty.util.thread.QueuedThreadPool.tryExecute(QueuedThreadPool.java:735)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.selectSubStrategy(AdaptiveExecutionStrategy.java:331)
...
The lock created by com.workato.agent.log.CloudAppender
seems to be the cause, not the lock in AdaptiveExecutionStrategy
(at least that's what we can tell from your limited stacktrace)
The lock created by com.workato.agent.log.CloudAppender seems to be the cause, not the lock in AdaptiveExecutionStrategy (at least that's what we can tell from your limited stacktrace)
This class doesn't contain any synchronized
methods. It extends ch.qos.logback.core.AppenderBase
with synchronized method so that's why its instance actually hold the lock.
I agree that our setup may be not the most common one since we use the custom Logback appender and the custom Jetty connector. But we don't use any Jetty hacks just documented API and there is the proof that the deadlock can be formally reproduced. I'm trying to reproduce the issue with the mininal test scenario but it'll take some time.
Just the idea: maybe the ReadWriteLock will work better for org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy
since reading operations won't cause any exclusive locking on this
instance.
Sorry but the logging framework should deal with these issues, as toString()
is entitled to lock and it must be possible to log with the same lock held.
The problem is the lock on the CloudAppender
like @joakime said.
If that is out of your control because it's performed by the Logback implementation, then it's a bug in Logback.
From your description, I bet you can reproduce this bug without any dependency on Jetty, just by implementing a toString()
that grabs a lock. Which would confirm it's a Logback issue.
@sbordet A servlet container as a universal tool is expected to be able to correctly work in different environments. In this scenario, one of the critical classes from Jetty internals leaks its reference into third-party SLF4J API implementation. This implementation is out of control and can acquire the exclusive lock at any time and interrupt normal server behavior. It's a common practice to pass this
into a logging API but not when it can be locked for debug purposes. Doubt anyone expects that a simple toString()
call can be locked forever awaiting the most recent object state.
As an analogy: the same way should work for external callbacks passed from the external code, if they fail the server should continue to operate normally.
@alekkol the same Jetty code with another logging library that is not Logback does not have any problem, because the other logging libraries do not have the Logback bug.
It is not a Jetty issue.
Try, and you will be able to reproduce the bug using Logback without any Jetty dependency.
@sbordet Of course, it can be reproduced with the Logback library only. But the same is true for the opposite: we have been using this logger for years and never had the same issues before. It appeared only after migrating to Jetty from another servlet container.
I can agree that calling the synchronized
method for every logging event is not the best solution for Logback but the same is true for toString()
with exclusive locks in Jetty. And both of these implementation details leads to deadlock finally.
The lock in toString()
is necessary for correctness, because it accesses internal state that is protected by that lock.
We are not the only ones grabbing a lock in toString()
, even the OpenJDK code does it, for example:
https://github.com/openjdk/jdk/blob/jdk-20%2B0/src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java#L1507
You agree that you are using a buggy logging library. File an issue there, or change the buggy library you are using.
File an issue there, or change the buggy library you are using.
That's why this issue was created. Both Jetty and Logback have the same bug - they call alien methods under the exclusive lock section provoking deadlocks. Jetty does it in AdaptiveExecutionStrategy.selectSubStrategy
method, Logback in AppenderBase.doAppend
.
We are not the only ones grabbing a lock in toString(), even the OpenJDK code does it, for example: https://github.com/openjdk/jdk/blob/jdk-20%2B0/src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java#L1507
This example is not relevant since SocketChannelImpl
doesn't call external methods under this lock.
That's why this issue was created.
You have created it in the wrong project. It is not a Jetty issue.
And we are not calling alien methods. We are calling a logging API (SLF4J) that you have bound to a buggy implementation. Change the logging implementation and your problems will be solved.
If we start implementing workarounds in Jetty for other libraries bugs, we would not have time to do any work on Jetty.
And we are not calling alien methods. We are calling a logging API (SLF4J) that you have bound to a buggy implementation. Change the logging implementation and your problems will be solved.
Why "calling a logging API (SLF4J)" in a critical section in Jetty is not a bug but calling Object.toString()
in the synchronized method in Logback is? Using this logic we can say that the Logback was "bound to a buggy implementation" of the servlet container. The fact that the issue in Logback was created earlier than this one can not be used as an argument of the problem's origin on their side.
Any implementation of SLF4J API can call Object.toString()
on a function argument inside a synchronized block because there are no explicit restrictions for that. And it can cause a sporadic Jetty outage due to the intermittent nature of the bug. Right now anyone can bootstrap a server with the latest version of Jetty and SLF4J+Logback and get unreliable software. I believe that it's not acceptable for mission-critical libraries.
If we start implementing workarounds in Jetty for other libraries bugs, we would not have time to do any work on Jetty.
It's a workaround for the Jetty bug. And it should be pretty easy to make this error happen rarely or even never by just removing Jetty critical classes references leakage into external API.
Not a Jetty issue.
Guess somethimes my app is having somethig similar after Jetty version up, I'm trying to investigate if it is related.
@alekkol The reproduction algorithm is not clear enough, so I cannot trace the bug. Could you elaborate on how you reproduce the bug?
It seems the two locks are always being "logically" nested, because
toString
invocation always follows AFTER entering doAppend
method (i.e. sync on AppenderBase monitor).
Correct me if I'm wrong, the order of key events is:
- lock A: enter
doAppend
=> sync on AppenderBase - lock B: enter
toString
=> AutoLock inAdaptiveExecutionStrategy
- unlock B: return from
toString
=> AutoLock inAdaptiveExecutionStrategy
- unlock A: return from
doAppend
=> unsync on AppenderBase
What is logically the possible scenario of Dead lock?
@diziaq that logic is a red herring, the problem has nothing to do with the AdaptiveExecutionStrategy
.
The AutoLock
on AdaptiveExecutionStrategy
is unrelated to the issue, and was just caught up in the thread dump.
In this specific case (of the original issue) the custom appender (CloudAppender
) is causing this issue entirely.
The original issue's CloudAppender
(and CircularBufferAppender
) chose to implement their solution on ch.qos.logback.core.AppenderBase
which has a synchronize modifier on Appender.doAppend()
and that fact is introducing the bug in their implementation, that they have to deal with.
See:
- https://jira.qos.ch/browse/LOGBACK-649 - Deadlock because of logging
- https://jira.qos.ch/browse/LOGBACK-1492 - AppenderBase is deadlock prone
They could have chosen ch.qos.logback.core.UnsynchronizedAppenderBase
instead and not had this issue. (BTW, most of the logger implementations in Logback use the UnsynchronizedAppenderBase
)