logging-log4j2 icon indicating copy to clipboard operation
logging-log4j2 copied to clipboard

Threads contention because of OutputStreamManager.writeBytes

Open bharathchandrathakkallapally-cpi opened this issue 10 months ago • 9 comments

Description

We are using the RollingFileAppender with below configuration

appender.file.type=RollingFile 
appender.file.name=LOGFILE
appender.file.fileName=${basePath}/out.log
appender.file.filePattern=${basePath}/out.log.%d{yyyy-MM-dd-HH}
appender.file.layout.type=PatternLayout
appender.file.layout.pattern=%d [%t] %-5p %c %x - %m%n
appender.file.policies.type=Policies
appender.file.policies.time.type=TimeBasedTriggeringPolicy
appender.file.policies.time.interval=1
appender.file.policies.time.modulate=true

Our application was crashing suddenly one day, we did not deploy any code changes, and nothing is changed in the environment. After attaching the profiler, we found that a single thread locked onto a resource at OutputStreamManager.writeBytes is blocking more than 100 threads which are trying to log, many more threads are in waiting state, eventually application is crashing. We tried restarting the application multiple times, we tried changing the hardware and also check whether we reached our storage limit or IO limits, nothing worked.

Eventual we fixed it by placing immediateFlush to false and root log level to warn. appender.file.immediateFlush=false

If we change the root log level to info, we are experiencing degraded performance even with immediateFlush false. Why did this happen? why suddenly and does using Async Logging help to get the performance back with info level logging? We are not using Async Appender or Async Logging currently.

Screenshot 2024-04-05 at 3 43 51 PM

Configuration

Version: log4j-core - 2.17.0 log4j-api - 2.17.0 log4j-jcl - 2.17.1 log4j-slf4j-impl - 2.17.1

Operating system: [OS and version] Ubuntu 20.04.2 LTS (Focal Fossa) JDK: [JDK distribution and version] openjdk version "1.8.0_402" OpenJDK Runtime Environment (build 1.8.0_402-8u402-ga-2ubuntu1~20.04-b06) OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

Logs

**Locked Thread**
http-nio-8080-exec-123 
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)

**108 Blocked Threads like below thread**
http-nio-8080-exec-276
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- waiting to lock <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)

Reproduction

We still didn't figured out what caused this issue suddenly.

I think this is a deadlock.

tcmot avatar Apr 05 '24 12:04 tcmot

Does this deadlock coincide with the hourly log file rollover you configured? I wonder if there's a race condition here between incoming log events and the rollover process.

jvz avatar Apr 05 '24 17:04 jvz

@tcmot according the thread dump analysis we did, there is no deadlock.

@jvz If it is related to the rollover, this issue should be only seen at the end of an hour when the rollover happens but we have seen threads getting blocked everytime we expose our service to traffic, not just at the end of an hour.

Are all servers like this? Is there a problem with the hard drive? If there is no deadlock, it is likely that data cannot be written to the hard drive.

@bharathchandrathakkallapally-cpi

tcmot avatar Apr 07 '24 08:04 tcmot

@tcmot yes, all the servers in the cluster are facing this issue. There is no problem with the hard drive or IO, we had the AWS guy checked our quota limits and we even tried provisioning new instances, it didn't worked.

I think that the crucial information that is missing here, is what is thread http-nio-8080-exec-123 waiting for. The lock on RollingFileManager should be the last one required to log the event.

@bharathchandrathakkallapally-cpi, do you have additional data on the "blocking" thread? The jstack command should print something like this:

"http-nio-8080-exec-123" daemon prio=10 tid=0x00007f3e88448800 nid=0x56f5 waiting for monitor entry [0x00000000472bc000]

ppkarwasz avatar Apr 08 '24 07:04 ppkarwasz

@ppkarwasz here is the data your are asking

"http-nio-8080-exec-123" #2723 daemon prio=5 os_prio=0 tid=0x00007f7e94095800 nid=0x34a8f waiting for monitor entry [0x00007f7ea87c4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
	- locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
	at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
	at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
	at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:542)
	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:500)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:483)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:388)
	at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
	at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:153)
	at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:376)
	at org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:159)

@bharathchandrathakkallapally-cpi,

Any idea what kind of object 0x00007f7ea87c4000 might be? Does it appear in the stacktraces of other threads?

ppkarwasz avatar Apr 08 '24 12:04 ppkarwasz

sorry, the object 0x00007f7ea87c4000 only appeared once in the dump, and it is with the "http-nio-8080-exec-123" thread .