conmon
conmon copied to clipboard
An Analysis of Conmon Container Logging Behaviors with Recommendations
A detailed analysis of the container logging behavior in conmon
as of tag v2.0.28
.
Summary recommendations:
- Match read size to actual
pipe(7)
size (64 KB by default on Linux)- This will reduce the number of partial log records emitted
- Align read buffer on page boundary, don't use leading byte just for remote consoles
- Consider reading from
pipe(7)
untilEWOULDBLOCK
- Raise IO vector count to 1,024 to reduce number of
writev(2)
system calls per bufferread(2)
- OR implement https://github.com/cri-o/cri-o/pull/1605 to avoid pathological behaviors
An Analysis of Conmon Container Logging Behaviors
Peter Portante, May 2021
Based on the v2.0.28
(GitHub tag) code base
-
conmon
uses non-blocking pipes for reading for thestdout
&stderr
pipes of a containera. Pipes created at lines 163 [3] and 178 [4] of
conmon.c
using thepipe2()
[5] system call, with theO_CLOEXEC
flag-
Pipes are 64 KB in size, the Linux default
-
Does
Go
change the size of the pipes dynamically?
b. Registers
stdio_cb()
, line 17 [1] ofctr_stdio.c
forpoll()
events -
-
In response to being invoked,
stdio_cb()
in turn callsread_stdio()
[2], which reads from the pipe file descriptor using aSTDIO_BUF_SIZE
sized buffer (+2 as described at [7])a. Where
STDIO_BUF_SIZE
is 8 KB [6], for a total of 8,192 bytes used in memoryb. [Flag] This means that 8 read system calls are required to drain the pipe
-
[Question] Does the
Go
integration change the pipe size to 8,192 bytes? -
The
read_stdio()
only makes oneread(2)
system call-
[Question] Why is only one
read(2)
system call made? Does that allow the polling loop to fairly round-robin between reading fromstdout
andstderr
? -
[Flag] We should consider having the
read_stdio()
continue reading untilread(2)
returnsEWOULDBLOCK
-
-
[Flag] The
STDIO_BUF_SIZE
should be 64 KB to allow the most efficient use of system calls- We should even consider increasing the size to the maximum allowed on the system to further increase the efficiency, if we can implement https://github.com/cri-o/cri-o/pull/1605.
c. [Flag] This also has the effect that 8 KB reads increases the number of "partial" log records emitted by
conmon
, since the pipe can fill up whileconmon
is processing a read.d. The read buffer is a stack local variable in
read_stdio()
, but the target address used in theread(2)
system call is moved by one byte [7], which means any reads returning >= 4 KB of data will result in 2 pages being used on the stack, with a maximum of 3 pages used- [Flag] This is only used for the remote consoles case [8], which means you could avoid the extra page use having
write_back_to_remote_consoles()
write the extra byte only when remote consoles are present
d. When 1 or more bytes are returned by the
read(2)
system call inread_stdio()
:-
write_to_logs()
is invoked to write the logs to disk, which in turn callswrite_journald()
andwrite_k8s_log()
-
For now, we ignore
write_journald()
for the rest of this analysis -
It is expected that the entire buffer returned by the
read(2)
system call is processed and written to disk
-
-
write_back_to_remote_consoles()
is called to satisfy any remote consoles- For now, we ignore this case for the rest of this analysis
-
-
Analysis of
write_k8s_log()
a. The approach taken by
write_k8s_log()
is to avoid memory copies of the data to be written to disk by using IO vectors with thewritev(2)
system callb. 128 vectors [9] are allocated [10] as a maximum to be used while processing any one buffer
c.
write_k8s_log()
creates a time stamp once at the beginning and uses it as the time stamp value for all the logs it processes and writesd. A "log" is defined to be all the bytes from the previous "newline" character up to, and including, the next newline character
-
At the end of the buffer, if no newlines are found, a special marker is used to record that a "partial" log record was processed
-
The start of a buffer assumes the beginning of a new log entry
e. When a new line is found, 3 vectors are used:
-
time stamp buffer [12], 44 bytes [11]
-
partial / full tag, 2 bytes [13]
-
actual log contents, N bytes [14]
f. A fourth vector is used when no "newline" characters are found in the remaining bytes of the buffer being processed
g. For every "newline" character encountered, 46 bytes are added to the output stream
h. 128 vectors can represent a maximum of 42 log entries (2 vectors unused)
-
For an 8,192 byte buffer, that is an average size of 195.048 bytes per line in order for one
writev()
system call to be used to write all bytes read to disk. -
If the average line length is >= 196 bytes, fewer vectors are required.
-
If the average line length is < 196 bytes and >= 98 bytes, 1 additional
writev()
system calls will be required: 2writev()
calls perread()
. -
If the average line length is < 98 bytes and >= 66 bytes, 2 additional
writev()
system calls will be required: 3writev()
calls perread()
. -
If the average line length is < 66 bytes and >= 49 bytes, 3 additional
writev()
system calls will be required: 4writev()
calls perread()
. -
Recommend 1,024 IO vectors to handle >= 25 byte average lines with 1 system call, and >= 13 byte average lines with 2 system calls
- The IO Vector data structure consists of a 4 byte
int
and an array of 16 bytestruct iovec
entries, takikng up 4 pages of the stack
- The IO Vector data structure consists of a 4 byte
-
Really, we should be implementing an alternate format of writing what was read from the pipe directly to disk so that the data stream from either pipe does not impact the behavior of
conmon
, see https://github.com/cri-o/cri-o/pull/1605 -
Here is a table how this works out as the average line length shrinks (byte ranges, and number of
writev(2)
system calls):
< 49 >= 40 5 < 40 >= 33 6 < 33 >= 28 7 < 28 >= 25 8 < 25 >= 22 9 < 22 >= 20 10 < 20 >= 18 11 17 12 16 13 15 & 14 14 13 16 12 17 11 18 10 20 9 22 8 25 7 28 6 33 5 40 4 49 3 66 2 98 1 196
-
[1] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L17 [2] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L101 [3] https://github.com/containers/conmon/blob/v2.0.28/src/conmon.c#L163 [4] https://github.com/containers/conmon/blob/v2.0.28/src/conmon.c#L178 [5] https://man7.org/linux/man-pages/man2/pipe.2.html [6] https://github.com/containers/conmon/blob/v2.0.28/src/config.h#L6 [7] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L103 [8] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_stdio.c#L131 [9] https://github.com/containers/conmon/blob/v2.0.28/src/utils.h#L224 [10] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L60 [11] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L22 [12] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L330 [13] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L342 [14] https://github.com/containers/conmon/blob/v2.0.28/src/ctr_logging.c#L349
I'll be updating the description to full mark down shortly.