monolog
monolog copied to clipboard
Why might NormalizerFormatter be logging malformed JSON?
Monolog version 1.24.0
on PHP 8, running on Windows
We've just started logging the peak memory usage of all web requests and CLI runs in our application so that we can do some focused performance refactoring.
To do this we've added a shutdown handler which invokes a Monolog Logger
instance and records JSON in a file. That file is monitored by Filebeat, which ships each new line to Logstash, which then attempts to parse the line as JSON and store it in ElasticSearch.
I had a report from my colleague in the ELK team that our records were causing some errors in Logstash. On investigation I found that I had a few records (only a small percentage) which had malformed JSON in the log file on disk. For example, yesterday's log file has 220,384 lines, but there are 220,336 records in ElasticSearch (48 missing).
Example malformed JSON:
ct_tasks_steps/ajax_save_qtp/665613","http_method":"PUT"}}
The logger is created as follows:
$logger = new Logger("memory_usage");
$logger->pushProcessor(new MemoryPeakUsageProcessor(realUsage: true));
$handler = new RotatingFileHandler($logFilesDirectory . "memory_usage.log", 10, MonologLogger::INFO);
$handler->setFormatter(new MemoryUsageFormatter());
$logger->pushHandler($handler);
if (!environment()->isCLI()) {
$logger->pushProcessor($container[WebProcessor::class]);
// Logs the current route in template form, i.e. `/projects/view/:id`
$logger->pushProcessor(function (array $record) {
$route = Router::parse(Router::url());
$controller = $route['controller'] ?? 'unknown_controller';
$action = $route['action'] ?? 'unknown_action';
$record['extra']['controller_action'] = "/{$controller}/{$action}";
return $record;
});
} else {
$logger->pushProcessor($container[CommandLineInterfaceArgumentsProcessor::class]);
}
return $logger;
MemoryUsageFormatter
looks like this:
<?php declare(strict_types=1);
namespace App\Logging\Formatter;
use App\DateTime\DateTime;
use DateTimeZone;
use Monolog\Formatter\NormalizerFormatter;
class MemoryUsageFormatter extends NormalizerFormatter
{
/**
* Formats message for storage in ELK
* Uses ECS with bare-minimum fields to keep file size as small as possible
*/
public function format(array $record)
{
/** @var array $record */
$record = parent::format($record);
$message = [
'@timestamp' => DateTime::createFromMysqlDateTime($record['datetime'], new DateTimeZone("UTC")),
'ecs' => [
'version' => '1.7',
],
'message' => "logged_memory_usage", // an easily searchable string
// deliberately title-cased to distinguish from ECS-mapped fields
'MemoryUsage' => [
'bytes' => (int)$record['message'],
'formatted' => $record['extra']['memory_peak_usage'],
],
];
if (isset($record['extra']['controller_action'])) {
$message['MemoryUsage']['controller_action'] = $record['extra']['controller_action'];
}
if (isset($record['extra']['command'])) {
$message['MemoryUsage']['command_line'] = $record['extra']['command'];
}
if (isset($record['extra']['url'])) {
$message['MemoryUsage']['url'] = $record['extra']['url'];
}
if (isset($record['extra']['http_method'])) {
$message['MemoryUsage']['http_method'] = $record['extra']['http_method'];
}
if (isset($record['context']['job'])) {
$message['MemoryUsage']['job_uuid'] = $record['context']['job']['uuid'];
$message['MemoryUsage']['job_class'] = $record['context']['job']['class'];
}
return $this->toJson($message) . "\n";
}
}
I'm not really sure where to begin debugging this. Have you ever seen anything like this before? The example malformed message has truncated the JSON in the middle of the MemoryUsage.url
property. /projec
has been omitted from the beginning of the URL.
wild guess: multiple processes are writing in a non-atomic way to the same file and thus sometimes corrupting it?
I wondered that, but Windows uses exclusive file locking, so it shouldn’t even be possible for a process to write to a file until locks made by other processes have been released.
I'd say try to upgrade to latest Monolog, 2.3 fixed some issues there with highly concurrent writes by increasing the stream chunk size so that logs are flushed at once to the file instead of being written in smaller chunks which perhaps can result in races like you're seeing.
If that doesn't fix it I am not sure what else to do sorry.
Thanks, I'll give that a go. It's on my TODO list anyway :)