fluent-bit
fluent-bit copied to clipboard
Fluent-bit stops processing logs under high load, locks writing in ch_manager pipe
Bug Report
My fluent-bit 1.5.7 is running in a container in k8s (AKS) environment. It’s configured to collect docker logs (33 tail inputs configured) and send them to elasticsearch (33 outputs) and a few filters. Recently, as the amount of logs per node increased, fluent-bit started sporadically freezing up. The process would continue running consuming 0% cpu and not processing any new logs or filesystem storage backlog. It would however respond to monitoring queries on its http port. After some debugging using strace and gdb, I found that it locks up attempting to write to ch_manager pipe. Here’s the stack trace
0x00007fac5aed74a7 in write () from target:/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007fac5aed74a7 in write () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000055a127ecad01 in flb_output_return (ret=1, th=0x7fac548cd240) at /tmp/fluent-bit/include/fluent-bit/flb_output.h:545
#2 0x000055a127ecade9 in flb_output_return_do (x=1) at /tmp/fluent-bit/include/fluent-bit/flb_output.h:576
#3 0x000055a127eccca1 in cb_es_flush (data=0x7fac5991e0b2, bytes=3842,
tag=0x7fac5437fd20 "kube.gws.gws-platform-datacollector-blue-6c474d7c84-wsdcw.gws-platform-datacollector-blue.2db22adbd3daaaf836a3f6311f4b3e5ad9ec7727280458ac68868419fb758ab9", tag_len=154,
ins=0x7fac5949e480, out_context=0x7fac56c6bc00, config=0x7fac59432c80) at /tmp/fluent-bit/plugins/out_es/es.c:748
#4 0x000055a127e72649 in output_pre_cb_flush () at /tmp/fluent-bit/include/fluent-bit/flb_output.h:449
#5 0x000055a1282a6907 in co_init () at /tmp/fluent-bit/lib/monkey/deps/flb_libco/amd64.c:117
#6 0x3039663238613165 in ?? ()
#
strace (filtered by read from fd 13, and write to fd 14, which is the ch_manager pipe fd’s)
...
write(14, "\0\200\1\20\2\0\0\0", 8) = 8
read(13, "\0\0\36\20\2\0\0\0", 8) = 8
read(13, "\0\0\27\20\2\0\0\0", 8) = 8
read(13, "\0\300\37\20\2\0\0\0", 8) = 8
write(14, "\0\200\25\20\2\0\0\0", 8) = 8
read(13, "\0\200\17\20\2\0\0\0", 8) = 8
write(14, "\0\200\17\20\2\0\0\0", 8) = 8
write(14, "\0\0\27\20\2\0\0\0", 8) = 8
write(14, "\0\300\37\20\2\0\0\0", 8) = 8
read(13, "\0\0\33\20\2\0\0\0", 8) = 8
read(13, "\0\300\r\20\2\0\0\0", 8) = 8
write(14, "\0\0\36\20\2\0\0\0", 8) = 8
read(13, "\0\300\24\20\2\0\0\0", 8) = 8
write(14, "\0\300\r\20\2\0\0\0", 8) = 8
write(14, "\0\300\24\20\2\0\0\0", 8) = 8
read(13, "\0\200\1\20\2\0\0\0", 8) = 8
write(14, "\0\0\33\20\2\0\0\0", 8) = 8
read(13, "\0\200\25\20\2\0\0\0", 8) = 8
read(13, "\0\200\17\20\2\0\0\0", 8) = 8
write(14, "\0\200\17\20\2\0\0\0", 8) = 8
read(13, "\0\0\27\20\2\0\0\0", 8) = 8
write(14, "\0@\16\20\2\0\0\0", 8) = 8
write(14, "\0\200\1\20\2\0\0\0", 8) = 8
read(13, "\0\300\37\20\2\0\0\0", 8) = 8
read(13, "\0\0\36\20\2\0\0\0", 8) = 8
write(14, "\0\0\27\20\2\0\0\0", 8) = 8
write(14, "\0\0\36\20\2\0\0\0", 8) = 8
write(14, "\0\300\37\20\2\0\0\0", 8) = 8
read(13, "\0\300\r\20\2\0\0\0", 8) = 8
read(13, "\0\300\24\20\2\0\0\0", 8) = 8
read(13, "\0\0\33\20\2\0\0\0", 8) = 8
write(14, "\0\300\r\20\2\0\0\0", 8) = 8
read(13, "\0\200\17\20\2\0\0\0", 8) = 8
read(13, "\0@\16\20\2\0\0\0", 8) = 8
read(13, "\0\200\1\20\2\0\0\0", 8) = 8
write(14, "\0\300\24\20\2\0\0\0", 8) = 8
write(14, "\0\200\1\20\2\0\0\0", 8) = 8
write(14, "\0@\16\20\2\0\0\0", 8) = 8
read(13, "\0\0\27\20\2\0\0\0", 8) = 8
write(14, "\0\0\33\20\2\0\0\0", 8) = 8
read(13, "\0\0\36\20\2\0\0\0", 8) = 8
read(13, "\0\300\37\20\2\0\0\0", 8) = 8
write(14, "\0\0\27\20\2\0\0\0", 8) = 8
write(14, "\0\0\36\20\2\0\0\0", 8) = 8
read(13, "\0\300\r\20\2\0\0\0", 8) = 8
write(14, "\0\300\37\20\2\0\0\0", 8) = 8
read(13, "\0\300\24\20\2\0\0\0", 8) = 8
read(13, "\0\200\1\20\2\0\0\0", 8) = 8
read(13, "\0@\16\20\2\0\0\0", 8) = 8
write(14, "\0\200\1\20\2\0\0\0", 8) = 8
write(14, "\0\300\r\20\2\0\0\0", 8) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
it looks like elasticsearch outputs may send so many responses to inputs at the same time that the pipe fills up and blocks in write(). But inputs are running in the same thread, so they can’t read responses from the pipe and fluent-bit locks up.
I produced a dirty fix for it by making ch_manager pipe non-blocking. I also tried extending the size of the pipe (or at least get it), but ioctl fails to get or set new pipe size. See the snippet below; I added the last line.
flb_engine.c:
/*
* Create a communication channel: this routine creates a channel to
* signal the Engine event loop. It's useful to stop the event loop
* or to instruct anything else without break.
*/
ret = mk_event_channel_create(config->evl,
&config->ch_manager[0],
&config->ch_manager[1],
config);
if (ret != 0) {
flb_error("[engine] could not create manager channels");
return -1;
}
flb_pipe_set_nonblocking(&config->ch_manager[1]); /* <----- I made it non-blocking ------- */
there's probably a cleaner way to fix it, but this one-liner worked for me. Now I am getting occasional “resource not available” error in the log, but fluent-bit survives and continues crunching logs.
Environment
Version used: 1.5.7 (container fluent/fluent-bit:1.5.7) kubernetes 1.16.13 docker 3.0.10+azure Ubuntu 16.04.1
Awesome Thanks @grep4error for all help. @edsiper This affecting more people https://github.com/fluent/fluent-bit/issues/2621 & https://github.com/fluent/fluent-bit/issues/2577. Please do the needful. We can help to validate fix. cc @tomerleib
I've done two more tests in this area:
- Downgraded the image to 1.3.6 - There was a slight improvement, only one pod encountered the issue and froze. 2/3 pods of the Daemonset continue to work.
- I've removed all the tails and left a single input and a single stdout output - No issues, Fluent-Bit is running for almost 3 days and nothing stopped.
@grep4error do you have an image that you can share with me and I will test it in my environment as well?
edited No changes for me with 1.5.7 and this line. Still stopped processing after 20-30 minutes.
However, I created an image for 1.7.0 (yes, I know it's not released, but I forgot to change the branch) and everything worked for more than 14 hours.
We have the same issue, for now we will revert to 1.3.X which seems not having this issue.
@mtparet which 1.3.x version have you tested? I tested with 1.3.7 and saw the same issue, although not entirely (2/3 pods are running and 1 froze) but still.
It was 1.3.9, I cannot guarantee we had no freeze but I did not observed it.
Tested 1.3.9, after 28 hours one of the fluent-bit pods froze again...
Hello @edsiper, Do you acknowledge the issue and so we need to find a fix for it ? Or this issue should not happen and there is something else outside of fluentbit that is broken.
FYI: I am taking a look at this (WIP)
I am thinking about a solution, making the socket async is right, but when EAGAIN is returned when trying to write to the pipe in a full state, will need an extra care, since that notification from the output saying "I am done, or need a retry", will be missed.
Work in process
Indeed at high load ch_manager get's saturated, the obvious solution seems to implement a pool of pipes/channels for the output plugins. Now we were abusing the internal event loop channel manager and it was quite OK until a certain load, I will work in a POC with independent notification channels.
Work in process
Thanks a lot of clarifying and your work on that ! @edsiper
Indeed at high load ch_manager get's saturated
Increasing CPU allocated could be a workaround ?
not necessary, it's a bit to have separate channels for that specific kind of notifications.
@grep4error @avdhoot @tomerleib @mtparet @shirolimit @clebs
I've pushed a possible solution in branch ch_manager, would you please build it and test it? If you confirm the fix is good to go, I can merge it in the 1.6 series this week, but I need your help with proper feedback.
About the solution: now every output plugin instance has it owns channels to notify events through the event loop, in this way it does not saturate the main engine ch_manager.
ping, any feedback ?
I didn't take the time to build/push/test it, I will.
@edsiper do you have any recommendations for a workaround when run on kubernetes? Basically, detecting a freeze and restarting is some sort of a temporary solution. One option seems to be creating an additional file output and configuring the k8s livecheck to verify if this file is regularly updated. If not, then restart. Does this sound like a viable option to you? Or maybe something like that relying on the prometheus metrics: https://github.com/fluent/fluent-bit/issues/2591
livenessProbe:
exec:
command:
- /bin/bash
- -exc
- >
function getOutTotalF0() { curl -s --connect-timeout 5 --retry 3 -f http://localhost:65012/api/v1/metrics/prometheus |grep '^fluentbit_output_proc_records_total{name="forward.0"}' | awk '{print $2}' ;};
T=$(getOutTotalF0);test -e ./m_oprtf0 && { test $(cat ./m_oprtf0) -ge $T && { echo "OutputRecordsTotal Forward0 freeze: $T"; exit 1;} ;}; test $T -ge -10 && { echo "$T" >| ./m_oprtf0; } || true;
initialDelaySeconds: 120
periodSeconds: 60
failureThreshold: 8
timeoutSeconds: 5
Right now I do not have environment (high load means every min pod coming & going) So not able to validate the fix. Sorry cc @edsiper
@edsiper Sorry for delaying, I've started to test it. I will let it run for a couple of days, once I'll have an insight I will update you.
@edsiper We're running into that same issue. I'm building the Docker image and deploying it here too and will try to report back. If anyone wants to quickly consume it, we've published it to our registry:
registry.public.mtl1.vexxhost.net/openstack/fluent-bit:1.6.6-1
If you're using the Helm charts, this should do:
image:
repository: registry.public.mtl1.vexxhost.net/openstack/fluent-bit
tag: 1.6.6-1
@edsiper here's a quick update. high load, 3 nodes cluster. The DS is running for 10 hours, currently, 2/3 of the pods are still processing logs. 1 pod has stopped processing any log after 8 hours, nothing in the log and nothing in K8s events for this pod. Another pod that still runs, reported that it was terminated some time ago due to error code 135.
Version 1.6.6 is deployed in our production (13 big nodes) since yesterday, we will wait in 2-3 days if the fix is working for us.
I will start testing with v1.6.6 as well and see if it's working or not, I believe that by tomorrow I will know for sure if at least one stopped working.
So far, there are no more freezes on our side.
I will start testing it simultaneously, first with @mnaser image. @mtparet can you share the image that you're using? In my environment when I've built the image, if stopped working for this.
@tomerleib I am using 1.6.6 version with this configuration :
[FILTER]
Name kubernetes
Match kube.*
Kube_Tag_Prefix kube.var.log.containers.
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 6MB
Skip_Long_Lines On
read_from_head on
DB /tail-db/tail-containers-state.db
DB.Sync Off
DB.locking true
[OUTPUT]
Name es
Match *
Host XXXX
Port 443
Logstash_Format On
Retry_Limit 5
Type _doc
Trace_Error true
Time_Key @timestamp-flb
Replace_Dots On
HTTP_User XXX
HTTP_Passwd XXXX
tls on
tls.verify on
Small clarification, when folks here are saying they're using 1.6.6 -- as in upstream official 1.6.6 or our patched version? I think that would help the maintainers. My decision to use 1.6.6-1 is that the assumption that it is 1.6.6 + 1 patch.
Also, 1.6.7 was released officially a few hours ago too.
Small clarification, when folks here are saying they're using 1.6.6 -- as in upstream official 1.6.6 or our patched version? I think that would help the maintainers. My decision to use 1.6.6-1 is that the assumption that it is 1.6.6 + 1 patch.
Also, 1.6.7 was released officially a few hours ago too.
Now, 1.6.8 is out :D