monolog icon indicating copy to clipboard operation
monolog copied to clipboard

Why might NormalizerFormatter be logging malformed JSON?

Open garethellis36 opened this issue 3 years ago • 4 comments

Monolog version 1.24.0 on PHP 8, running on Windows

We've just started logging the peak memory usage of all web requests and CLI runs in our application so that we can do some focused performance refactoring.

To do this we've added a shutdown handler which invokes a Monolog Logger instance and records JSON in a file. That file is monitored by Filebeat, which ships each new line to Logstash, which then attempts to parse the line as JSON and store it in ElasticSearch.

I had a report from my colleague in the ELK team that our records were causing some errors in Logstash. On investigation I found that I had a few records (only a small percentage) which had malformed JSON in the log file on disk. For example, yesterday's log file has 220,384 lines, but there are 220,336 records in ElasticSearch (48 missing).

Example malformed JSON:

ct_tasks_steps/ajax_save_qtp/665613","http_method":"PUT"}}

The logger is created as follows:

                $logger = new Logger("memory_usage");

                $logger->pushProcessor(new MemoryPeakUsageProcessor(realUsage: true));

                $handler = new RotatingFileHandler($logFilesDirectory . "memory_usage.log", 10, MonologLogger::INFO);
                $handler->setFormatter(new MemoryUsageFormatter());

                $logger->pushHandler($handler);

                if (!environment()->isCLI()) {
                    $logger->pushProcessor($container[WebProcessor::class]);

                    // Logs the current route in template form, i.e. `/projects/view/:id`
                    $logger->pushProcessor(function (array $record) {
                        $route = Router::parse(Router::url());

                        $controller = $route['controller'] ?? 'unknown_controller';
                        $action = $route['action'] ?? 'unknown_action';
                        $record['extra']['controller_action'] = "/{$controller}/{$action}";

                        return $record;
                    });
                } else {
                    $logger->pushProcessor($container[CommandLineInterfaceArgumentsProcessor::class]);
                }

                return $logger;

MemoryUsageFormatter looks like this:

<?php declare(strict_types=1);

namespace App\Logging\Formatter;

use App\DateTime\DateTime;
use DateTimeZone;
use Monolog\Formatter\NormalizerFormatter;

class MemoryUsageFormatter extends NormalizerFormatter
{
    /**
     * Formats message for storage in ELK
     * Uses ECS with bare-minimum fields to keep file size as small as possible
     */
    public function format(array $record)
    {
        /** @var array $record */
        $record = parent::format($record);

        $message = [
            '@timestamp' => DateTime::createFromMysqlDateTime($record['datetime'], new DateTimeZone("UTC")),
            'ecs' => [
                'version' => '1.7',
            ],
            'message' => "logged_memory_usage", // an easily searchable string

            // deliberately title-cased to distinguish from ECS-mapped fields
            'MemoryUsage' => [
                'bytes' => (int)$record['message'],
                'formatted' => $record['extra']['memory_peak_usage'],
            ],
        ];

        if (isset($record['extra']['controller_action'])) {
            $message['MemoryUsage']['controller_action'] = $record['extra']['controller_action'];
        }

        if (isset($record['extra']['command'])) {
            $message['MemoryUsage']['command_line'] = $record['extra']['command'];
        }

        if (isset($record['extra']['url'])) {
            $message['MemoryUsage']['url'] = $record['extra']['url'];
        }

        if (isset($record['extra']['http_method'])) {
            $message['MemoryUsage']['http_method'] = $record['extra']['http_method'];
        }

        if (isset($record['context']['job'])) {
            $message['MemoryUsage']['job_uuid'] = $record['context']['job']['uuid'];
            $message['MemoryUsage']['job_class'] = $record['context']['job']['class'];
        }

        return $this->toJson($message) . "\n";
    }
}

I'm not really sure where to begin debugging this. Have you ever seen anything like this before? The example malformed message has truncated the JSON in the middle of the MemoryUsage.url property. /projec has been omitted from the beginning of the URL.

garethellis36 avatar Jan 28 '22 18:01 garethellis36

wild guess: multiple processes are writing in a non-atomic way to the same file and thus sometimes corrupting it?

mfn avatar Jan 29 '22 10:01 mfn

I wondered that, but Windows uses exclusive file locking, so it shouldn’t even be possible for a process to write to a file until locks made by other processes have been released.

garethellis36 avatar Jan 29 '22 15:01 garethellis36

I'd say try to upgrade to latest Monolog, 2.3 fixed some issues there with highly concurrent writes by increasing the stream chunk size so that logs are flushed at once to the file instead of being written in smaller chunks which perhaps can result in races like you're seeing.

If that doesn't fix it I am not sure what else to do sorry.

Seldaek avatar Mar 13 '22 19:03 Seldaek

Thanks, I'll give that a go. It's on my TODO list anyway :)

garethellis36 avatar Mar 14 '22 15:03 garethellis36