mtail icon indicating copy to clipboard operation
mtail copied to clipboard

mtail doesn't follow log rotation when running under WSL and docker

Open JJassonn69 opened this issue 4 years ago • 4 comments

I am using prometheus and mtail to gather information from chia logs. It is working fine until the log file size is 20MB and new file log file is created, mtail then receives random data. I have to recompose the program to fix the issue but the same problem occurs when another log file is created.

I am using docker, VSL and prometheus. I do have mtail installed locally aswell but no idea how to implement it.

any help on how can i remedy it.

image image

JJassonn69 avatar May 21 '21 06:05 JJassonn69

The INFO log should help explain what mtail is doing at log rotation, and if you need more details, add -vmodule=tail=2 to the command line. Perhaps you can share the first lines of the INFO log and the section where the rotation occurs.

What's VSL stand for?

On Fri, 21 May 2021, 16:12 Jason, @.***> wrote:

I am using prometheus and mtail to gather information from chia logs. It is working fine until the log file size is 20MB and new file log file is created, mtail then receives random data. I have to recompose the program to fix the issue but the same problem occurs when another log file is created.

I am using docker, VSL and prometheus. I do have mtail installed locally aswell but no idea how to implement it.

any help on how can i remedy it.

[image: image] https://user-images.githubusercontent.com/83615043/119090547-6ca81880-ba14-11eb-8796-3ebc96346742.png [image: image] https://user-images.githubusercontent.com/83615043/119090681-982b0300-ba14-11eb-8286-941d0a681539.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX63LOERHJAEHYJHQHCLTOX2ULANCNFSM45IPPV7A .

jaqx0r avatar May 22 '21 02:05 jaqx0r

I am seeing the same issue with the same app. When the logrotate takes place mtail seems to re-ingest the original file, exploding the counters. This may be related on the implementation of the windows subsystem for Linux (WSL) that docker desktop uses to expose the log to the container as I run exectly the same setup under linux and the issue does not occur.

When running mtail directly on the windows host it works ... kind of as there mtail seems to simply prevent the logrotate of taking place in the first place :-)

I'am currently running mtail with the -v=2 and so i can post results when the rotate takes place.

pikolos avatar Jun 10 '21 09:06 pikolos

This isn't great at all! I assume that it's because I don't understand windows file semantics, and would appreciate help!

Sounds like we have two problems here, one is the pinning of closed files in windows itself, and the other is the behaviour when running under wsl.

If you all can get me verbose INFO logs per above that might help start the debugging process.

jaqx0r avatar Jun 11 '21 22:06 jaqx0r

Here is the log of the exact moment the log was rotated:

 [36mmtail         | [0m I0611 17:17:52.171393       1 tail.go:244] already got a logstream on "/var/log/chia/debug.log"
 [36mmtail         | [0m I0611 17:17:52.171178       1 filestream.go:175] &{0xc00008f320}: current seek is 20971559
 [36mmtail         | [0m I0611 17:17:52.171507       1 filestream.go:176] &{0xc00008f320}: new size is 167
 [36mmtail         | [0m I0611 17:17:52.171549       1 filestream.go:181] &{0xc00008f320}: truncate? currentoffset is 20971559 and size is 167
 [36mmtail         | [0m I0611 17:17:52.171663       1 filestream.go:191] &{0xc00008f320}: Seeked to 0
 [36mmtail         | [0m I0611 17:17:52.172390       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.172585       1 filestream.go:110] &{0xc00008f320}: decode and send
 [36mmtail         | [0m I0611 17:17:52.172687       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172744       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172861       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172924       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172941       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172958       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172989       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173096       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173125       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173164       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173188       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173353       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173410       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173493       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173548       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173584       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173601       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173620       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173649       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173664       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173683       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173738       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173753       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173772       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173807       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173827       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173851       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173905       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173924       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173946       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173990       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.174839       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.174984       1 filestream.go:110] &{0xc00008f320}: decode and send
 [36mmtail         | [0m I0611 17:17:52.175011       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175075       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175102       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175203       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175335       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175485       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175633       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175659       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175682       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175736       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175843       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175874       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176022       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176125       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176147       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176191       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176210       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176232       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176274       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176294       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176316       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.177252       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.177465       1 filestream.go:110] &{0xc00008f320}: decode and send

The rest of the logfile is a repetion 25xsendline, a read 4096 and a decode, seems like mtail went in a loop here for the next 45 gb of logfile. CPU for the mtail container was also at 100%. This went on on until the docker desktop host ran out of memory due to of trying to log 45gb :-)

It could be related to how docker desktop (windows) interact with the windows filesystem trought the WSL subsystem. However another container running promtail that was watching the same file had no issue with the rotate as seen in its log:

promtail  | ts=2021-06-11T17:17:52.0376528Z caller=log.go:124 level=info msg="Re-opening moved/deleted file /var/log/chia/debug.log ..."
promtail  | 2021/06/11 17:17:52 stat of old file returned, this is not expected and may result in unexpected behavior
promtail  | ts=2021-06-11T17:17:52.0400007Z caller=log.go:124 level=info msg="Waiting for /var/log/chia/debug.log to appear..."
promtail  | ts=2021-06-11T17:17:52.2932814Z caller=log.go:124 level=info msg="Successfully reopened /var/log/chia/debug.log"

If we look at the code of promtail of that warning stat of old file returned, this is not expected and may result in unexpected behavior there seems to be something going on with opening the old file at logrotate (line 210).

https://github.com/grafana/tail/blob/master/tail.go

It seems that the for some reason at logrotate the old file is still returned somehow and mtail goes into a loop. This might all be related to the implementation of docker desktop and its interaction with the local filesystem via WSL. WSL seems to use Plan9 for this. I am currently looking into other ways of mounting the log file to the container under windows.

Regardless of the filesystem thing there may be a way to fix how mtail handles this, at least looking at the comments in the code for promtail there is something going on with logrotate and the old file being returned.

pikolos avatar Jun 12 '21 11:06 pikolos