logging-log4j2
logging-log4j2 copied to clipboard
Threads contention because of OutputStreamManager.writeBytes
Description
We are using the RollingFileAppender with below configuration
appender.file.type=RollingFile
appender.file.name=LOGFILE
appender.file.fileName=${basePath}/out.log
appender.file.filePattern=${basePath}/out.log.%d{yyyy-MM-dd-HH}
appender.file.layout.type=PatternLayout
appender.file.layout.pattern=%d [%t] %-5p %c %x - %m%n
appender.file.policies.type=Policies
appender.file.policies.time.type=TimeBasedTriggeringPolicy
appender.file.policies.time.interval=1
appender.file.policies.time.modulate=true
Our application was crashing suddenly one day, we did not deploy any code changes, and nothing is changed in the environment. After attaching the profiler, we found that a single thread locked onto a resource at OutputStreamManager.writeBytes is blocking more than 100 threads which are trying to log, many more threads are in waiting state, eventually application is crashing. We tried restarting the application multiple times, we tried changing the hardware and also check whether we reached our storage limit or IO limits, nothing worked.
Eventual we fixed it by placing immediateFlush to false and root log level to warn. appender.file.immediateFlush=false
If we change the root log level to info, we are experiencing degraded performance even with immediateFlush false. Why did this happen? why suddenly and does using Async Logging help to get the performance back with info level logging? We are not using Async Appender or Async Logging currently.
Configuration
Version: log4j-core - 2.17.0 log4j-api - 2.17.0 log4j-jcl - 2.17.1 log4j-slf4j-impl - 2.17.1
Operating system: [OS and version] Ubuntu 20.04.2 LTS (Focal Fossa) JDK: [JDK distribution and version] openjdk version "1.8.0_402" OpenJDK Runtime Environment (build 1.8.0_402-8u402-ga-2ubuntu1~20.04-b06) OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)
Logs
**Locked Thread**
http-nio-8080-exec-123
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
**108 Blocked Threads like below thread**
http-nio-8080-exec-276
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- waiting to lock <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
Reproduction
We still didn't figured out what caused this issue suddenly.
I think this is a deadlock.
Does this deadlock coincide with the hourly log file rollover you configured? I wonder if there's a race condition here between incoming log events and the rollover process.
@tcmot according the thread dump analysis we did, there is no deadlock.
@jvz If it is related to the rollover, this issue should be only seen at the end of an hour when the rollover happens but we have seen threads getting blocked everytime we expose our service to traffic, not just at the end of an hour.
Are all servers like this? Is there a problem with the hard drive? If there is no deadlock, it is likely that data cannot be written to the hard drive.
@bharathchandrathakkallapally-cpi
@tcmot yes, all the servers in the cluster are facing this issue. There is no problem with the hard drive or IO, we had the AWS guy checked our quota limits and we even tried provisioning new instances, it didn't worked.
I think that the crucial information that is missing here, is what is thread http-nio-8080-exec-123
waiting for. The lock on RollingFileManager
should be the last one required to log the event.
@bharathchandrathakkallapally-cpi, do you have additional data on the "blocking" thread? The jstack
command should print something like this:
"http-nio-8080-exec-123" daemon prio=10 tid=0x00007f3e88448800 nid=0x56f5 waiting for monitor entry [0x00000000472bc000]
@ppkarwasz here is the data your are asking
"http-nio-8080-exec-123" #2723 daemon prio=5 os_prio=0 tid=0x00007f7e94095800 nid=0x34a8f waiting for monitor entry [0x00007f7ea87c4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:542)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:500)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:483)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:388)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:153)
at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:376)
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:159)
@bharathchandrathakkallapally-cpi,
Any idea what kind of object 0x00007f7ea87c4000
might be? Does it appear in the stacktraces of other threads?
sorry, the object 0x00007f7ea87c4000
only appeared once in the dump, and it is with the "http-nio-8080-exec-123" thread .