logging-log4j2 Create an alternative async logger implementation using JCTools

The current async logger functionality all relies on the LMAX Disruptor library. This library does not support resizable queues which requires end users to configure a large enough ring buffer for their application use case. As such, the default settings in Log4j allocate between 80 and 140 MB of memory for async logging ring buffers which are likely too large for microservices and other more modern use cases in memory constrained environments.

JCTools provides a growable MPSC (multi-producer single consumer) queue which would be a great alternative to using Disruptor in use cases where memory allocation is preferred over blocking or discarding log messages when the queue is full. This would allow for a smaller default queue size that could safely grow to fit an application's typical logging throughput.

This feature is a high level description of the problem. Given the large API differences between LMAX Disruptor and JCTools, I'm not sure on what this would look like specifically.

Jan 19 '24 20:01 jvz

What would be difference between this feature and AsyncAppender coupled by a JCTools queue that we already have?

We might simply enhance AsyncAppender to format the message on the calling thread before it sends it to the worker thread.

Jan 19 '24 22:01 ppkarwasz

The Disruptor serves as a queue and object pool combined. The issue, as I understand it, is that the upfront cost of ~80-140MB live set footprint is not an acceptable overhead for some users. Users can configure a small buffer to reduce the cost, but run the risk of the application threads getting locked when the logging thread falls behind and the buffer of messages is full.

Jan 20 '24 09:01 nitsanw

To re-implement using JCTools can be quite straight forward:

event = spmcObjectPool.getLogEvent();
if (event == null) event = new LogEvent();
event.populate(...);
mpscLogEventsQueue.offer(event); // need to handle queue full if this is not a unbounded queue

Use can use a bounded queue for the spmcObjectPool (to limit the footprint), and a growable/chunked/other MPSC implementation for the mpscLogEventsQueue. You will need to implement the full queue handling, and potentially the wakeup mechanism for the logging thread.

Jan 20 '24 09:01 nitsanw

A further alternative, if you want to format on the application thread (which I think is a good idea), is to output the formatted message into an Aeron log buffer. The logging IO thread will poll that and write it to file. This is IMO a better option than the Disruptor, and have very low impact on the GC (low allocation, very low live set), but will still have a large (configuration dependant) RSS impact, and does not shrink at rest. It also does not allow resizing IIRC.

IMO, the default for a library should be a configuration that scales the cost with the workload (i.e. very little logging -> very little memory/cpu cost).

Jan 20 '24 09:01 nitsanw

@jvz, the performance benefits of asynchronous loggers are extremely subjective to the deployment environment. Putting aside Log4j team itself is not able to reproduce the (8 year old!) performance figures shared in the website, Google'ing for "Log4j performance" returns a myriad of totally varying experiences. Yet it is a fact that asynchronous loggers are the biggest source of complexity in the code base. I also do not recall user complaints on Disruptor-related preallocation.

What would be difference between this feature and AsyncAppender coupled by a JCTools queue that we already have? We might simply enhance AsyncAppender to format the message on the calling thread before it sends it to the worker thread.

@ppkarwasz is spot on. I cannot agree more.

I think we should first move asynchronous loggers to a separate module and relieve the code base from its complexity and yet-to-be-proven performance benefits. (@ppkarwasz, haven't you had a PR for this somewhere?)

I am not inclined to add more to the existing complexity of asynchronous loggers without being able to effectively measure its performance. If we figure out, say, a JCTools-backed AsyncAppender can deliver up to 90% of the performance asynchronous loggers do, I am confident that a majority will prefer the simplicity of the former.

Jan 21 '24 20:01 vy

I also do not recall user complaints on Disruptor-related preallocation.

Most people never look at a heap dump.

I am confident that a majority will prefer the simplicity of the former.

I'm not sure the end solution will be simpler. It will probably be more "familiar" as JCTools conforms to the Queue interface, but ultimately the Disruptor is a great solution when you want a queue + object pool data structure. If there's a resizable Disruptor out there, maybe that would be an easy solution.

Jan 22 '24 07:01 nitsanw

@nitsanw,

You are probably right: the main difference between AsyncLogger, AsyncLoggerConfig and AsyncAppender is that the first two have full control over the lifecycle of mutable log events, so they can modify log events (e.g. swap parameters between events) without breaking other components.

The further down we go in the logging pipeline, the harder it becomes to perform such tricks. The logic of AsyncLogger is almost trivial (it breaks the log event by moving its components to the disruptor and resets the event); the logic of AsyncLoggerConfig is more complex (it must perform all synchronous logging before breaking the log event); AsyncAppender doesn't even try to deal with mutable events and takes a snapshot of the current event.

Jan 22 '24 09:01 ppkarwasz

Hi!

Entering in the discussion, because I'm one of the developers of JCTools (@nitsanw is both and the author!) and because I was curious :P

I've always considered a more preferreable approach to accumulate the (formatted? better) log events locally (if is not a virtual thread, in a thread local's buffer) - and have a flusher thread which drain at intervals. To backpressure, you can put some boundaries to the thread local capacity, and await/throttle if there's isn't any room. This approach (which can still use a unbounded queue, without signaling to poller/flush) guarantee that:

an almost idle poller/flusher, doesn't have to be awaken reducing the overhead on the logging/application threads (which doesn't compete nor spend any cycles on signaling)
allow the flusher to tune its own flushing frequency based on the estimated size of the flushing queue or just the estimated submission rate

Some care should be taken to ensure that the thread-local buffer is still "shared" with the flusher, which could uses a double buffer-like tecnique (a la hdr histogram) to "switch" with a fresh new thread-local buffer, while flushing what's being accumulated so far. This should allow to NOT have any leftover in case the application thread die.

The main downside for this approach is that "total ordering" isn't possible, but TBH, has ever been? Submitting to a queue guarantee some order based on the queue implementation, which as far as I know, is not guaranteed at all nor many verify it works as expected.

Jan 22 '24 10:01 franz1981

I like where this conversation is going so far! Do either of you have advice on performance measurements here? While I can certainly implement things according to recommendations here, I'm not very familiar with the benchmarking process.

Jan 22 '24 17:01 jvz

AsyncLogger will improve the perceived performance of the application by giving control back to it sooner than an AsyncAppender, and if you have filters and more than one appender that can make a difference. Note that it will NOT improve the actual speed of logging. While I do not like the idea of removing AsyncLogger support I am in favor of moving it to a log4j-async module, assuming that can be done.

Jan 23 '24 06:01 rgoers

logging-log4j2 logging-log4j2 copied to clipboard

Create an alternative async logger implementation using JCTools

logging-log4j2
logging-log4j2 copied to clipboard