fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Fluent-bit input plugin tail doesn't process all logs: scan_blog add(): dismissed:

Open rguske opened this issue 3 years ago • 27 comments

Bug Report

Describe the bug Fluent Bit is not processing all logs located in /var/log/containers/.

To Reproduce The following messages are displayed:

[2021/10/01 14:40:05] [debug] [input:tail:tail.0] scanning path /var/log/containers/*.log                                                                                                                                                                                                                                               │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-3b631f27f6667599ae940f94afe6a65a4d1d488e7979fced513fa910082a5ae1.log, inode 404768                                                                                                │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-ca32320178170fe4198ce1b0bd57d8ea031c7c886a7b0e3d66bb1b78b67613b8.log, inode 921337                                                                                                │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-63659cdc8e5ddba3eaf729be280661b45fd198e6d2c7195965be85cdca81f41a.log, inode 536837                                                                                                        │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log, inode 1190181                                                                                                       │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-ovs-08045b767f2f8ee421b3b4d8d5b646b49b4e12199ae957cad178dd3d11670ec6.log, inode 663855 
  • Steps to reproduce the problem: Configuration details:

ServiceAccount:

rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  verbs:
  - get
  - list
  - watch

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: vmware-system
  labels:
    k8s-app: fluent-bit
apiVersion: v1
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
    [FILTER]
        Name                  modify
        Match                 kube.*
        Copy                  kubernetes k8s

    [FILTER]
        Name                  nest
        Match                 kube.*
        Operation             lift
        Nested_Under          kubernetes
  filter-record.conf: |
    [FILTER]
        Name                record_modifier
        Match               *
        Record tkg_cluster veba-demo.jarvis.tanzu
        Record tkg_instance veba-demo.jarvis.tanzu
    [FILTER]
        Name                  nest
        Match                 kube.*
        Operation             nest
        Wildcard              tkg_instance*
        Nest_Under            tkg

    [FILTER]
        Name                  nest
        Match                 kube_systemd.*
        Operation             nest
        Wildcard              SYSTEMD*
        Nest_Under            systemd
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     debug
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE input-systemd.conf
    @INCLUDE input-kube-apiserver.conf
    @INCLUDE input-auditd.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE filter-record.conf
    @INCLUDE output-syslog.conf
  input-auditd.conf: |
    [INPUT]
        Name              tail
        Tag               audit.*
        Path              /var/log/audit/audit.log
        Parser            logfmt
        DB                /var/log/flb_system_audit.db
        Mem_Buf_Limit     50MB
        Refresh_Interval  10
        Skip_Long_Lines   On
  input-kube-apiserver.conf: |
    [INPUT]
        Name              tail
        Tag               apiserver_audit.*
        Path              /var/log/kubernetes/audit.log
        Parser            json
        DB                /var/log/flb_kube_audit.db
        Mem_Buf_Limit     50MB
        Refresh_Interval  10
        Skip_Long_Lines   On
  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10
  input-systemd.conf: |
    [INPUT]
        Name                systemd
        Tag                 kube_systemd.*
        Path                /var/log/journal
        DB                  /var/log/flb_kube_systemd.db
        Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
        Systemd_Filter      _SYSTEMD_UNIT=containerd.service
        Read_From_Tail      On
        Strip_Underscores   On
  output-syslog.conf: |
    [OUTPUT]
        Name   syslog
        Match  kube.*
        Host   10.197.79.57
        Port   514
        Mode   tcp
        Syslog_Format        rfc5424
        Syslog_Hostname_key  tkg_cluster
        Syslog_Appname_key   pod_name
        Syslog_Procid_key    container_name
        Syslog_Message_key   message
        Syslog_SD_key        k8s
        Syslog_SD_key        labels
        Syslog_SD_key        annotations
        Syslog_SD_key        tkg

    [OUTPUT]
        Name   syslog
        Match  kube_systemd.*
        Host   10.197.79.57
        Port   514
        Mode   tcp
        Syslog_Format        rfc5424
        Syslog_Hostname_key  tkg_cluster
        Syslog_Appname_key   tkg_instance
        Syslog_Message_key   MESSAGE
        Syslog_SD_key        systemd
  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        docker-daemon
        Format      regex
        Regex       time="(?<time>[^ ]*)" level=(?<level>[^ ]*) msg="(?<msg>[^ ].*)"
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        logfmt
        Format      logfmt

    [PARSER]
        Name        syslog-rfc5424
        Format      regex
        Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog-rfc3164-local
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S
        Time_Keep   On

    [PARSER]
        Name        syslog-rfc3164
        Format      regex
        Regex       /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
        Time_Key    time
        Time_Format %b %d %H:%M:%S
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name    kube-custom
        Format  regex
        Regex   (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: vmware-system
  labels:
    k8s-app: fluent-bit
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: fluent-bit
  template:
    metadata:
      labels:
        k8s-app: fluent-bit
    spec:
      containers:
      - image: projects.registry.vmware.com/tkg/fluent-bit:v1.6.9_vmware.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 2020
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: fluent-bit
        ports:
        - containerPort: 2020
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v1/metrics/prometheus
            port: 2020
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 80m
            memory: 200Mi
          requests:
            cpu: 50m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: var-log
        - mountPath: /var/log/pods
          name: var-log-pods
        - mountPath: /var/log/containers
          name: var-log-containers
        - mountPath: /var/lib/docker/containers
          name: var-lib-docker-containers
          readOnly: true
        - mountPath: /fluent-bit/etc/
          name: fluent-bit-config
        - mountPath: /run/log
          name: systemd-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: fluent-bit
      serviceAccountName: fluent-bit
      terminationGracePeriodSeconds: 10
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      volumes:
      - hostPath:
          path: /var/log
          type: ""
        name: var-log
      - hostPath:
          path: /var/log/pods
          type: ""
        name: var-log-pods
      - hostPath:
          path: /var/log/containers
          type: ""
        name: var-log-containers
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: var-lib-docker-containers
      - hostPath:
          path: /run/log
          type: ""
        name: systemd-log
      - configMap:
          defaultMode: 420
          name: fluent-bit-config
        name: fluent-bit-config
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Expected behavior All logs in /var/log/containers/ should be processed.

Your Environment

  • Version used: 1.6.9, 1.7.9 and 1.8.9
  • Configuration: See above
  • Environment name and version Kubernetes Version: 1.20.2
  • Server type and version: Single K8s instance See: https://github.com/vmware-samples/vcenter-event-broker-appliance
  • Operating System and version: VMware PhotonOS v4
  • Filters and plugins: @INCLUDE input-kubernetes.conf @INCLUDE input-systemd.conf @INCLUDE input-kube-apiserver.conf @INCLUDE input-auditd.conf @INCLUDE filter-kubernetes.conf @INCLUDE filter-record.conf @INCLUDE output-syslog.conf

Additional context Running tail -f manually from within the system on a specific pod log, which is writing to stdout, works.

{"log":"10/03/2021 14:47:13 - Handler Processing Completed ...\n","stream":"stdout","time":"2021-10-03T14:47:13.829672574Z"}
{"log":"\n","stream":"stdout","time":"2021-10-03T14:47:13.829772103Z"}

Logs which e.g. aren't processed:

root@veba-kn [ /var/log/containers ]# ls -rtl                                                                                                                                                                                          
total 376                                                                                                                                                                                                                                                                                                                                                                                                                                     
lrwxrwxrwx 1 root root 100 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c
2a4756/antrea-agent/3.log                                                                                                                                                                                                              
lrwxrwxrwx 1 root root  98 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-ovs-3f300f1d7b28c069df1f34cf37ff89be95d69fc3dc4ea0f269b5bd07ce5d56c1.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c2a
4756/antrea-ovs/3.log                                                                                                                                                                                                                  
lrwxrwxrwx 1 root root 102 Sep 13 21:31 envoy-89vct_contour-external_shutdown-manager-c8ed97927c25d465f31cce5ab8bd91d02742504f8cf73ad53e493738d0a17f74.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51e
c2dd3a/shutdown-manager/3.log                                                                                                                                                                                                          
lrwxrwxrwx 1 root root  91 Sep 13 21:31 envoy-89vct_contour-external_envoy-0ea7a33d12105058f74eae9653dd0266ac99ef2ba7f6cb3a3b04a8ec3bc02525.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51ec2dd3a/envo
y/3.log                                                                                                                                                                                                                                
lrwxrwxrwx 1 root root 104 Sep 13 21:31 contour-5869594b-7jm89_contour-external_contour-803e6591f657fae9539b64ae4f86fa44cce99b409c5f92979c6045cf4b98b52c.log -> /var/log/pods/contour-external_contour-5869594b-7jm89_cc6cf243-7d3f-483
9-91e8-741ab87f6488/contour/3.log                                                                                                                                                                                                      
lrwxrwxrwx 1 root root 106 Sep 13 21:31 contour-5d47766fd8-n24mz_contour-internal_contour-ae34a8ae0b8398da294c5061ec5c0ef1e9be8cb2979f07077e5e9df12f2bab67.log -> /var/log/pods/contour-internal_contour-5d47766fd8-n24mz_a87131ad-d73a
-4371-a47b-dcc410f3b6e4/contour/3.log                                                                                                                                                                                                  
lrwxrwxrwx 1 root root 100 Sep 13 21:31 coredns-74ff55c5b-mjdlr_kube-system_coredns-60bd5f49def85a0ddc929e2c2da5c793a3c6de00cd6a81bdcfdb21f3d4f45129.log -> /var/log/pods/kube-system_coredns-74ff55c5b-mjdlr_7ef260c1-308e-4162-8a84-2
31d560f8023/coredns/3.log

I've also tried running the DS in

securityContext:
          privileged: true

Similar issues I found but which doesn't provide the solution for this issue: #3857 #4014

Your help would be much appreciated. Thanks

rguske avatar Oct 03 '21 16:10 rguske

@rguske I also have this problem, how did you solve it?

zy286938348 avatar Nov 16 '21 02:11 zy286938348

@rguske I also have this problem, how did you solve it?

Hi @zy286938348, I have no clue at the moment but I will follow some of the made suggestions here: https://fluent-all.slack.com/archives/C0CTQGHKJ/p1637005220493300 Please keep me posted if you have more luck with finding the solution for this particular problem. Thanks

rguske avatar Nov 16 '21 07:11 rguske

Hi @zy286938348, which container runtime are you using?

rguske avatar Nov 19 '21 13:11 rguske

@rguske @zy286938348 I am also seeing this issue using containerd as my runtime. It came up ~16 days after my first deployment of fluentbit 1.8.1 on a fresh k8s 1.21 cluster on EKS. My fluentbit configuration is basically the default helm chart install, with the kubernetes parser and thats about it. It looks like I can't get Slack thread above @rguske - let me know if I can share more info to help figure this out.

kc-dot-io avatar Nov 27 '21 02:11 kc-dot-io

I think this is the RCA for me: https://github.com/fluent/fluent-bit/issues/3873

When I removed Ignore_Older the issues I was seeing went away quite quickly.

Please tag me if you are able to try this and confirm it works for you too.

kc-dot-io avatar Nov 27 '21 02:11 kc-dot-io

Hi @kc-dot-io, I appreciate your comment and help. I wanted to update this issue already last week with my latest observations but last week was a busy one. However, it's interesting that you're mentioning that you are seeing this issue with containerd since I figured out, that changing the runtime from docker to containerd finally provided positive results. Subsequently, I was pretty sure it's due to the old dockershim. Regarding the use of Ignore_Older - I'm sure I've never added it to the CM but I will double check this next week. Again, thanks a lot.

rguske avatar Nov 27 '21 19:11 rguske

I also have this issue. Upgrading Fargate 1.3(using Docker as runtime) to 1.4 (using containerd) did not help.

teru01 avatar Dec 03 '21 09:12 teru01

Circling back to this and I'm no longer certain that Ignore_Older is the RCA for me & yes can confirm I'm using containerd.

The issue still appears to occur from time to time and while it's not limited to a single node usually, I have observed the circumstance where one node will work while another isn't. I'm now testing increasing the resources that are being allocated to the fluebt bit replicas since the issue appears to come and go.

If anyone has any more ideas, I'm happy to troubleshoot it with my setup. Please do ping me in follow ups.

kc-dot-io avatar Dec 07 '21 01:12 kc-dot-io

Is there any updates on this issue? I am facing the same on K8s 1.21 deployment with Fluent Bit v1.8.12 and elasticsearch.

srideok2 avatar Feb 18 '22 19:02 srideok2

@srideok2 I don't think there is a permanent solution yet but I've talked to a few other people who had issues and we've had luck with reconfiguring our fluentbits to get to a place where things were stable.

From what I can see the issue mostly happens when buffer sizes get exceeded due to large log lines.

The solution I used and worked for at least one other person was to turn on the option for skipping long lines and then jack up the buffer size on the inputs and filters.

IRC there is another open issue that speaks to the issue at a lower level and I think there are multiple different symptoms of them same root cause here, which is basically that when the memory buffer is exceed, fluent-bit just silently stops working until it can be reset.

Ping me on Slack community if you have any questions. I hope this helps.

kc-dot-io avatar Feb 18 '22 21:02 kc-dot-io

Hi all, it finally worked for our project after replacing Docker with Containerd and by applying the following config: https://github.com/vmware-samples/vcenter-event-broker-appliance/tree/development/files/configs/fluentbit/templates

rguske avatar Feb 20 '22 13:02 rguske

Has this been resolved? This seems to be related to not removal of inode during file removal process i.e. (the inode entry not being removed from files_static, files_event[https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_config.h#L129, https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_file.c#1108])

ankitkr avatar Mar 10 '22 13:03 ankitkr

@edsiper can you please give us some pointers here to debug this issue. As I have understood https://github.com/fluent/fluent-bit/blob/cd92cf36099e69c09b0448e6bc0032bd4ac81ac5/plugins/in_tail/tail_file.c#L861 this method is returning -1 and the files are getting dismissed. can you please help with some pointers on understanding when does this method return -1.

ramatudaan avatar Mar 15 '22 13:03 ramatudaan

There is a bug in v1.8.13, we have just published the new container image for v1.8.14, please upgrade:

fluent/fluent-bit:1.8.14

edsiper avatar Mar 17 '22 23:03 edsiper

I'm using version 1.9.0 and I'm getting dismissed logs. This doesn't really seems fixed.

danielserrao avatar Mar 25 '22 08:03 danielserrao

Same issue here. Openshift 4 CoreOS and cri-o. Any input regarding this would be great.

andutt avatar Mar 25 '22 09:03 andutt

@edsiper can you please give us some pointers here to debug this issue. As I have understood

https://github.com/fluent/fluent-bit/blob/cd92cf36099e69c09b0448e6bc0032bd4ac81ac5/plugins/in_tail/tail_file.c#L861

this method is returning -1 and the files are getting dismissed. can you please help with some pointers on understanding when does this method return -1.

This got fixed when we started using 1.9.0. @danielserrao

ramatudaan avatar Mar 25 '22 09:03 ramatudaan

@ramatudaan i switched my fluent-bit image to 1.9.0 and my issue now seems to be security/permission related or missconfiguration of templates to match Openshift 4 CoreOS/cri-o. I created and issue here on it: here

andutt avatar Mar 25 '22 10:03 andutt

Hello guys, is there any ETA on this @edsiper? Sound like a lot of issues have been opened about this topic and debian packages 1.8.X are not available anymore. All 1.9.X packages are not fixing the issue.

Lord-Y avatar Apr 22 '22 14:04 Lord-Y

Same issue guys, tail logging not works

ahvahsky2008 avatar May 26 '22 09:05 ahvahsky2008

Have same issue. Tried both output as http to logstash and es plugin to ES

kashirindynatech avatar May 27 '22 12:05 kashirindynatech

fluent-bit版本:1.9.3 问题:使用fluent-bit 的tail插件收集日志时,fluent-bit运行一段时间后,会停止继续收集日志 问题原因:使用multiline.parser进行日志合并时,cont_state可能将若干条日志合并为一行,导致tail停止采集 解决思路: multiline.parser进行日志合并时,受限使用start_state判断是否为开始行,然后使用cont_state判断后续行是否需要合并,因此state_state和cont_state应采用互斥的逻辑, 正确示范: rule "start_state" "/^(\d{4})(.)/" "cont" rule "cont" "/^(?!\d{4})(.)/" "cont" 错误示范: rule "start_state" "/^(\d{4})(.)/" "cont" rule "cont" "/(.)/" "cont"

目前暂时没有遇到tail停止采集的问题

sxyandapp avatar Jun 10 '22 03:06 sxyandapp

Has anyone tried turning Skip_Long_Lines to off ? Some of my logs were dismissed, because they were too long I guess, this did the trick

Jonathan-w6d avatar Jul 27 '22 10:07 Jonathan-w6d

I'm having this issue as well on 1.9.6. The problem is that I NEED the long lines to be processed.

zulrang avatar Aug 05 '22 15:08 zulrang

@Jonathan-w6d yes this helped for me. Also increasing the max buffer size. That said you'll still run into an issue with a memory leak and eventually need to restart the process. I have to do this every 90-120 days.

@zulrang Try increasing your max buffer size to something very large - you'll probably still have to restart the processes every so often, but in theory if they buffer is large enough it should be able to process long lines until it can't again.

kc-dot-io avatar Aug 05 '22 16:08 kc-dot-io

@zulrang Try increasing your max buffer size to something very large - you'll probably still have to restart the processes every so often, but in theory if they buffer is large enough it should be able to process long lines until it can't again.

I have all buffers set to 2M and it has never successfully flushed chunks from one specific node - the one that is logging stack traces (about 8k long). The other 13 work fine.

zulrang avatar Aug 06 '22 00:08 zulrang

I'm having the same issue, tried all suggestions in this thread bar catainerd. Tail is dismissing the one and only log I want it to tail. Have tried downgrading to 1.6.10 and upgrading to various versions above up to latest. The problem occurs in every version I now try.

Eks 1.21, docker, Amazon Linux 2 node group, latest helm install from bitnami

scostelloppb avatar Aug 10 '22 20:08 scostelloppb

I have all buffers set to 2M and it has never successfully flushed chunks from one specific node - the one that is logging stack traces (about 8k long). The other 13 work fine.

@zulrang did you try going higher? I went all the way up to ~25M using the tail input, k8s filter and skip long lines off before it was stable.

kc-dot-io avatar Aug 12 '22 00:08 kc-dot-io

Either there's something seriously wrong with FluentBit or I'm fundamentally misunderstanding how it works If it requires a 25M buffer to process a few 10k long lines

zulrang avatar Aug 12 '22 00:08 zulrang

@kc-dot-io I have a 100M buffer, Skip_Long_Lines is Off and yesterday I had dismissed lines again, buffer was full, I had to rollout fluent bit. I don't like this.

Jonathan-w6d avatar Aug 12 '22 07:08 Jonathan-w6d