fluent-bit-kubernetes-logging
fluent-bit-kubernetes-logging copied to clipboard
Unstable using latest from 0.13-dev branch
I am experiencing a lot of instability when applying the latest changes from 0.13-dev branch, specifically #16
Eventually if a pod crashes on a busy nodes and enters CrashLoopBackOff, it won't ever recover. I am still investigating, but if you can see anything obvious, I would really appreciate your insight.
At first, I thought it was the memory and or cpu limits, so I removed those and crashes happen much less reliably. Without limits, I'm still seeing what looks like multiple failure reasons. I changed the namespace (to kangaroo) and kafka topic (to k8s-firehose), and I changed the Log_Level to debug. In the Kube_URL, kubernetes.default.svc got a few Temporary failure in name resolution errors so I changed it to kubernetes.default.svc.cluster.local and it have not seen it again.
I am using kail to follow all the daemonset pods in parallel, but that's quite chatty, so I do filter it down to errors with some context using:
kail --ds=fluent-bit | grep -A 10 -B 10 error
The output I get is:
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (854 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (871 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [oukangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [sched] retry=0x7fdd38017938 1 in 234 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [sched] retry=0x7fbce1a17938 5 in 329 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [retry] re-using retry for task_id=2 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [sched] retry=0x7f2aa9a17a00 2 in 101 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [sched] retry=0x7f2aa9a179d8 1 in 749 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [sched] retry=0x7fbce1a17960 3 in 691 seconds
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (867 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (884 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [dekangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
--
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-9e77c3d34cae27579fb2236fd361cc4d8d0f4018e2f1cb76a68a4d8f0b16b774.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=491254
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2778795
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1319893
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=284611
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] message delivered (1112 bytes, partition 4)
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [sched] retry=0x7fdd38017988 5 in 1123 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [sched] retry=0x7f2aa9a17a50 3 in 1609 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [sched] retry=0x7fdd380178e8 3 in 1386 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [sched] retry=0x7fbce1a178c0 1 in 1636 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
--
--
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [sched] retry=0x7fbce1a17960 3 in 888 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [retry] re-using retry for task_id=1 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [sched] retry=0x7f2aa9a179d8 1 in 1755 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [retry] re-using retry for task_id=0 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [sched] retry=0x7f2aa9a179b0 0 in 1741 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:03:56] [debug] [retry] rkangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-1ee969753bd79567e97d12a8c82e10542c4b720db6d0aa1f78a4009c6d064920.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=515164
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2800532
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1346840
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=335508
do you know if is there any best practice about using a DNS name to reach the API server ? filter_kubernetes by default uses kubernetes.default.svc, but what about kubernetes.default.svc.cluster.local ? (cc: @solsson)
I've seen a lot of both [service].[namespace] as well as names ending with .svc.cluster.local but rarely names ending with .svc. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#a-records seems to recommend the full name. I don't know if there's specific conventions for Kubernetes API access.
thanks for the feedback, I will go ahead and change that.
@StevenACoffman is this still an issue ?
@edsiper You mentioned a lack of "check" in out_kafka in #18
The instability I'm seeing seems completely attributable due to the memory (and cpu?) limits for nodes with lots of pre-existing logs.
From @solsson on Jan 26, 2018:
Spikes in memory use at pod start are impractical. Can log processing be halted when kafka buffers hit a size limit? Would it be possible to add output buffer size to prometheus metrics?
From @leahnp on May 17, 2017 0:6
Add memory limit to deployment yaml. Test special case: in long-running clusters with lots of pre-existing logs, deploy Fluent-bit, initial workload is very heavy then it evens out. If it hits the memory limit in this initial processing it will continually be killed and re-created.
Copied from original issue: samsung-cnct/kraken-logging-fluent-bit-daemonset#5 and Moved to samsung-cnct/chart-fluent-bit#9
@edsiper I think you can merge #18 as further increases in limit would make no difference. Only caps to buffer sizes will.
What's the effect of Mem_Buf_Limit on the input plugin at start? Desired behavior of Tail would be that parsing stops temporarily. According to http://fluentbit.io/documentation/0.12/configuration/backpressure.html#membuflimit it can be set on output plugins too, but am I correct to interpret your earlier remarks as this having no effect because the kafka client does the buffering? Maybe https://github.com/fluent/fluent-bit/issues/495 can help for a cap there, through queued.max.messages.kbytes etc.
@solsson merged, thanks.
Mem_Buf_Limit only applies for input plugins to pause data ingestion into the engine. Since out_kafka buffer the data (not delivery), Fluent Bit issue a "OK", so in_tail keeps ingesting data. The fix is to add out_kafka logic to real check if a message was delivered.
If you see memory grow with a different output plugin, there is definitely something wrong, I will double check the code anyways
I'm only running out_kafka. I will try out_kafka with queue.buffering.max.kbytes after the 0.13 release. The default seems to be 1GB, so with https://github.com/fluent/fluent-bit/issues/495 I can set it to something like 10MB instead. The docs say "Maximum total message size sum allowed on the producer queue." and allowed indicates that out_kafka would be notified when the max is reached.
Hmm... I pulled from the latest, removed the cpu and memory limits, and I'm getting some CrashLoopBackOff Pods terminated with exitCode 139, which I think is a Segmentation Fault (SIGSEGV 11). No termination message. This is not on a node with excessive existing logs.
The logs from fluent-bit look entirely normal. I tried changing the Log_Level to debug and deleted the old pod, when the new one gets created it logs normal debug messages:
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (1075 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (1116 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (980 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (2110 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] message delivered (983 bytes, partition 0)
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (2111 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (983 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (997 bytes) for topic 'k8s-firehose'
Earlier in fluent-bit pod's output, I am getting a lot of:
[2018/01/30 22:26:08] [debug] [filter_kube] could not merge log as requested
I cannot seem to have the pod on that node come up healthy, regardless of restarts or terminate and re-create attempts, but the rest do.
I altered the configmap and changed the filter-kubernetes.conf to:
Merge_Log Off
K8S-Logging.Parser Off
Applying the change, deleting the pod, the daemonset recreated the pod and it came up healthy after several hours of other failed attempts.
FYI: 0.13-dev:0.7 is out:
https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev
FYI: 0.13-dev:0.9 is out:
https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev
I've upgraded and it looks good to me.
0.13-dev:0.9 is very solid so far (20 hours, large volume).