mtail icon indicating copy to clipboard operation
mtail copied to clipboard

mtail keeps on running if producer crashes when reading from stdin

Open ema opened this issue 5 years ago • 6 comments

Hi!

Wikimedia uses mtail for various purposes, including exposing varnish statistics. To do that, we've got a very simple shell script called varnishmtail. As you can see, the script boils down to varnishncsa | mtail -logs /dev/stdin. We used to run mtail with -logfds 0 and then moved to -logs /dev/stdin when -logfds was removed (see this comment on mtail issue #3).

Now, there's a problem we've discovered recently. If the producer dies (varnishncsa in this case) mtail keeps on running normally, hence the varnishmtail script keeps on running, and the systemd unit responsible for the whole thing does not notice anything. However, for all purposes the system is at that point not functioning, given that stats aren't updated any longer. See a more detailed description on our bug tracking system.

I think that, in the special case of when mtail is reading from stdin, receiving EOF should make the process exit. Thoughts?

ema avatar Jul 29 '20 09:07 ema

That sounds like a good idea.

I wonder if there are any weird effects by doing so though, does it need another flag to turn on that behaviour? I think it doesn't, if there are no other log files open to read and the FD closes then it seems there's nothing it can do afterwards.

In the interim if you're using bash can you turn on pipefail mode to make the shell kill mtail?

On Wed, 29 Jul 2020, 19:22 Emanuele Rocca, [email protected] wrote:

Hi!

Wikimedia uses mtail for various purposes, including exposing varnish statistics. To do that, we've got a very simple shell script called varnishmtail https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/files/varnishmtail. As you can see, the script boils down to varnishncsa | mtail -logs /dev/stdin. We used to run mtail with -logfds 0 and then moved to -logs /dev/stdin when -logfds was removed (see this comment on mtail issue #3 https://github.com/google/mtail/issues/3#issuecomment-511341389).

Now, there's a problem we've discovered recently. If the producer dies (varnishncsa in this case) mtail keeps on running normally, hence the varnishmtail script keeps on running, and the systemd unit responsible for the whole thing does not notice anything. However, for all purposes the system is at that point not functioning, given that stats aren't updated any longer. See a more detailed description on our bug tracking system https://phabricator.wikimedia.org/T259020.

I think that, in the special case of when mtail is reading from stdin, receiving EOF should make the process exit. Thoughts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/331, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX67B34MC2QL7KRXHS4DR57S5DANCNFSM4PLMQLTQ .

jaqx0r avatar Jul 30 '20 04:07 jaqx0r

I wonder if there are any weird effects by doing so though, does it need another flag to turn on that behaviour? I think it doesn't, if there are no other log files open to read and the FD closes then it seems there's nothing it can do afterwards.

I also don't think another flag is needed.

In the interim if you're using bash can you turn on pipefail mode to make the shell kill mtail?

That is the first thing I thought of doing too! However, pipefail is about the exit status of the pipeline, not about making any command stop. If mtail keeps on running, the shell waits for it whether pipefail is enabled or not.

ema avatar Jul 30 '20 07:07 ema

As of HEAD right now I think that mtail will exit properly when stdin is closed if you only have one log (/dev/stdin) and you also add the flag --one_shot.

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

If you can try out in the meantime that would be nice, but not necessary.

jaqx0r avatar Jan 15 '21 00:01 jaqx0r

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

Please do this. Because it looks --one_shot does not support bucket. I deserve to use stdin with bucket.

Thanks!

shogos3 avatar Mar 05 '21 08:03 shogos3