logging-log4j2 icon indicating copy to clipboard operation
logging-log4j2 copied to clipboard

Appender may caused deadlock on AsyncLoggerDisruptor

Open quaff opened this issue 1 year ago • 8 comments
trafficstars

Description

Log operation is blocked when the RingBuffer is full, then waiting for appender to consume LogEvent's from the RingBuffer, but the appender itself may operate logging before the consumption to trigger deadlock.

Here is my actual case in production, I'm using Kafka appender, the Kafka client logging caused deadlock, see TransactionManager.java

"kafka-producer-network-thread | producer-1" #44 daemon prio=5 os_prio=0 tid=0x00007f468310c800 nid=0x256255 waiting for monitor entry [0x00007f467ad6c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:246)
    - waiting to lock <0x0000000701070490> (a java.lang.Object)
    at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
    at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
    at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:202)
    at org.apache.logging.log4j.core.async.AsyncLogger.access$100(AsyncLogger.java:67)
    at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:157)
    at org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:130)
    at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:378)
    at org.apache.kafka.common.utils.LogContext$LocationAwareKafkaLogger.writeLog(LogContext.java:434)
    at org.apache.kafka.common.utils.LogContext$LocationAwareKafkaLogger.info(LogContext.java:387)
    at org.apache.kafka.clients.producer.internals.TransactionManager.setProducerIdAndEpoch(TransactionManager.java:505)
    at org.apache.kafka.clients.producer.internals.TransactionManager.bumpIdempotentProducerEpoch(TransactionManager.java:542)
    at org.apache.kafka.clients.producer.internals.TransactionManager.bumpIdempotentEpochAndResetIdIfNeeded(TransactionManager.java:559)
    - locked <0x00000007016c0898> (a org.apache.kafka.clients.producer.internals.TransactionManager)
    at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:330)
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:247)
    at java.lang.Thread.run(Thread.java:748) 
"http-nio-8080-exec-273" #3682 daemon prio=5 os_prio=0 tid=0x00007f465b422800 nid=0x2908d3 runnable [0x00007f4648b48000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
    at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
    at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
    at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
    at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
    at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:247)
    - locked <0x0000000701070490> (a java.lang.Object)
    at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
    at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
    at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
    at org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
    at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
    at org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
    at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2168)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2122)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2105)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1997)
    at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1862)
    at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:185)

Proposal

The issue is not Kafka specified, log4j should discard LogEvent from appenders if the underlying RingBuffer is full.

quaff avatar Aug 28 '24 02:08 quaff

Even if it is not a Kafka client component , any other client component that integrates log4j2 has similar issues. The client component that integrates log4j2 also produces log event.

Disable log4j2 for client component. Increase the capacity of RingBuffer.

tcmot avatar Aug 28 '24 04:08 tcmot

Disable log4j2 for client component.

Is there a way to disable logging for particular lib except increasing logging level threshold?

Increase the capacity of RingBuffer.

Not an option, it will be full soon or later.

quaff avatar Aug 28 '24 06:08 quaff

[!TIP] TL;DR: Please set the log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull Log4j configuration property to false. See Property Sources for a list of ways to do it.

@quaff,

There are already a lot of configuration properties that regulate what happens when the queue becomes full:

Note that Log4j Core never blocks on the asynchronous thread used by AsyncLoggerDisruptor:

https://github.com/apache/logging-log4j2/blob/b593be7d881bb4c46d8ea09e14bb4ee9dea9567b/log4j-core/src/main/java/org/apache/logging/log4j/core/async/AsyncLoggerDisruptor.java#L331-L340

Your problem arises, because the Kafka appender uses its own background thread to send log events and Log4j Core does not know that it should not block on that thread. You have several solutions to this problem:

  • You can simply tell Log4j Core not to block on any thread by setting log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.
  • You can stop using asynchronous loggers with the Kafka Appender: IMHO it doesn't make sense to have two asynchronous barriers that log events must pass before being logged.
  • You can use a synchronous logger for org.apache.kafka and an asynchronous logger for the rest (see Mixed sync/async loggers):
    <AsyncRoot level="INFO">
      <AppenderRef ref="KAFKA"/>
    </AsyncRoot>
    <Logger name="org.apache.kafka">
      <AppenderRef ref="CONSOLE"/>
    </Logger>
    

[!WARNING] To switch from the "All asynchronous loggers" to the "Mixed sync/async loggers", you need to restore the default value of the log4j2.contextSelector configuration property.

ppkarwasz avatar Aug 28 '24 07:08 ppkarwasz

@quaff, @ppkarwasz is spot on with his analysis and suggestions. My 2 cents are...

  1. Stick to the simplest setup (i.e., only use synchronous loggers) unless you can prove that this causes a significant performance regression for your application.
  2. Kafka Appender is planned to be removed. You should consider more future-proof alternatives.

vy avatar Aug 28 '24 07:08 vy

Thanks for your quick response. Currently I'm setting log4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector, and using two type of appenders, RollingFile for local file and Kafka for centralized log collection, I think Mixed sync/async is suitable for such use case that make RollingFile async and Kafka sync, please correct me if I'm wrong.

quaff avatar Aug 28 '24 08:08 quaff

log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.

The correct system property key is AsyncLogger.SynchronizeEnqueueWhenQueueFull.

quaff avatar Aug 28 '24 08:08 quaff

Thanks for your quick response. Currently I'm setting log4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector, and using two type of appenders, RollingFile for local file and Kafka for centralized log collection, I think Mixed sync/async is suitable for such use case that make RollingFile async and Kafka sync, please correct me if I'm wrong.

Using RollingFile asynchronously for the org.apache.kafka package and Kafka synchrounously for the rest should work, although I would recommend to only use synchrounous loggers.

When you use asynchronous elements, you loose the ability of receiving LoggingExceptions if an appender fails. If logging is important for your application, consider using a synchronous setup with a Failover appender that uses Kafka as primary and Rollingfile as failover.

ppkarwasz avatar Aug 28 '24 08:08 ppkarwasz

log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.

The correct system property key is AsyncLogger.SynchronizeEnqueueWhenQueueFull.

That is the legacy pre-2.10 name of the configuration property. While it still works for backward compatibility, AsyncLogger.SynchronizeEnqueueWhenQueueFull is deprecated and you should use

  • log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull if you use Java System properties or a log4j2.component.properties file
  • LOG4J_ASYNC_LOGGER_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL if you want to set an OS environment variable.

ppkarwasz avatar Aug 28 '24 08:08 ppkarwasz

This issue is stale because it has been waiting for your feedback for more than 60 days. The Apache Logging Services community values every submitted issue, but without additional information from you, we are unable to provide a solution to your problem. Please comment on this issue or it will be closed in 7 days.

github-actions[bot] avatar Oct 28 '24 02:10 github-actions[bot]