Message ordering in multithreaded environment
I'm trying to figure out how does Quill order messages when there are two or more writers.
I think that existing benchmarks are a good point to start at. So I tried to run hot path latency test with system clock and examined produced log file.
However, it looks like message ordering is broken there. The following snippet is taken from produced log file (quill_hot_path_rdtsc_clock.log)
...
08:03:09.867062523 [377622] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 9999, message: 18, double: 10998.9
08:03:09.867062556 [377622] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 9999, message: 19, double: 10998.9
08:03:09.870180248 [377644] quill_hot_path_system_clock.cpp:67 INFO preallocate
08:03:09.870665108 [377644] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 0, double: 0
08:03:09.870665204 [377644] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 1, double: 0
08:03:09.870665239 [377644] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 2, double: 0
08:03:09.870665267 [377644] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 3, double: 0
08:03:09.870665342 [377644] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 4, double: 0
==> 08:03:09.870211474 [377643] quill_hot_path_system_clock.cpp:67 INFO preallocate
08:03:09.871261215 [377643] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 0, double: 0
08:03:09.871261296 [377643] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 1, double: 0
08:03:09.871261340 [377643] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 2, double: 0
08:03:09.871261369 [377643] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 3, double: 0
08:03:09.871261434 [377643] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 4, double: 0
==> 08:03:09.870234302 [377642] quill_hot_path_system_clock.cpp:67 INFO preallocate
08:03:09.871433994 [377642] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 0, double: 0
08:03:09.871434091 [377642] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 1, double: 0
08:03:09.871434145 [377642] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 2, double: 0
08:03:09.871434247 [377642] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 3, double: 0
08:03:09.871434308 [377642] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 4, double: 0
==> 08:03:09.870154786 [377645] quill_hot_path_system_clock.cpp:67 INFO preallocate
08:03:09.871832113 [377645] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 0, double: 0
08:03:09.871832286 [377645] quill_hot_path_system_clock.cpp:85 INFO Logging iteration: 0, message: 1, double: 0
...
So the message ordering looks broken at all lines containing 'preallocate' message except the first one. They all have timestamps that are earlier than previous message.
I use the latest Quill source code master/44bcb901 on Ubuntu 22.04.2 with 5.15 kernel running on 16 core Xeon CPU.
Does it mean that message ordering is really broken here?
Logs are ordered by timestamp in the backend thread. To avoid contention between threads, Quill uses a separate queue per thread instead of a global queue. Because of this design, the backend has to read from multiple queues and sort the messages by timestamp.
Sometimes, a thread might push a log message that gets processed while messages from other threads haven’t arrived yet. Since there are no other messages at that moment, this message is written to the log immediately. This can lead to slightly out-of-order timestamps. It’s more likely to occur in benchmarks, where all threads are logging in tight loops, rather than in typical real-world logging.
You can take a look at this test, which checks that log messages are ordered by timestamp:
https://github.com/odygrd/quill/blob/master/test/integration_tests/MultiFrontendThreadsTimestampOrderTest.cpp
To work around this in a stricter way, the above test sets ClockSourceType::System and a custom log_timestamp_ordering_grace_period:
https://github.com/odygrd/quill/blob/44bcb9011d2c35f86371a120823093290a85e4ed/test/integration_tests/MultiFrontendThreadsTimestampOrderTest.cpp#L26
More details here in the docs:
https://quillcpp.readthedocs.io/en/latest/users-api.html#_CPPv4N14BackendOptions35log_timestamp_ordering_grace_periodE