fluent-operator icon indicating copy to clipboard operation
fluent-operator copied to clipboard

broken connection/ failed to flush chunk

Open jayenzo opened this issue 3 years ago • 8 comments

Describe the bug

After configuring fluentd ClusterOutput and ClusterInput

apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: name: es labels: fluentbit.fluent.io/enabled: "true" fluentbit.fluent.io/component: logging spec: matchRegex: (?:kube|service).(.*) es: host: elasticsearch-tools.internal port: 9200 generateID: true logstashPrefix: ks-logstash-log logstashFormat: true timeKey: "@timestamp" httpUser: valueFrom: secretKeyRef: key: username name: es-secret httpPassword: valueFrom: secretKeyRef: key: password name: es-secret

apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterInput metadata: name: tail labels: fluentbit.fluent.io/enabled: "true" fluentbit.fluent.io/component: logging spec: tail: tag: kube.* path: /var/log/containers/*.log parser: docker refreshIntervalSeconds: 10 memBufLimit: 5MB skipLongLines: true db: /fluent-bit/tail/pos.db dbSync: Normal

curl -u "elastic:8HfQ3f4sWd5YB165JZr40Xw3" -k "https://elasticsearch-tools.internal:9200" { "name" : "elasticsearch-outils-es-master-0", "cluster_name" : "elasticsearch-tools", "cluster_uuid" : "jOPcyYEATJGnH6ALi3vCRw", "version" : { "number" : "8.1.0", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a", "build_date" : "2022-03-03T14:20:00.690422633Z", "build_snapshot" : false, "lucene_version" : "9.0.0", "minimum_wire_compatibility_version" : "7.17.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "You Know, for Search" }

telnet elasticsearch-tools.internal 9200 Trying 192.168.34.202... Connected to elasticsearch-outils.lait.qc.ca. Escape character is '^]'.

To Reproduce

apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: name: es labels: fluentbit.fluent.io/enabled: "true" fluentbit.fluent.io/component: logging spec: matchRegex: (?:kube|service).(.*) es: host: elasticsearch-tools.internal port: 9200 generateID: true logstashPrefix: ks-logstash-log logstashFormat: true timeKey: "@timestamp" httpUser: valueFrom: secretKeyRef: key: username name: es-secret httpPassword: valueFrom: secretKeyRef: key: password name: es-secret

apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterInput metadata: name: tail labels: fluentbit.fluent.io/enabled: "true" fluentbit.fluent.io/component: logging spec: tail: tag: kube.* path: /var/log/containers/*.log parser: docker refreshIntervalSeconds: 10 memBufLimit: 5MB skipLongLines: true db: /fluent-bit/tail/pos.db dbSync: Normal

Expected behavior

Send Logs to Elasticsearch

Your Environment

- Fluent Operator version:
- Container Runtime:
- Operating system:
- Kernel version:

How did you install fluent operator?

No response

What happened?

No response

Your Error Log

[2022/04/04 03:32:51] [ warn] [engine] failed to flush chunk '12-1649043167.572082150.flb', retry in 6 seconds: task_id=3, input=tail.0 > output=es.0 (out_id=0)
[2022/04/04 03:32:51] [error] [http_client] broken connection to elasticsearch-tools.internal 9200 ?

Additional context

not too sure what the issues can be I'm able to telnet and curl the endpoint?

jayenzo avatar Apr 04 '22 03:04 jayenzo

not too sure what I'm missing ?

jayenzo avatar Apr 04 '22 03:04 jayenzo

Hi @jayenzo . Can you please enable debug log level and share the log? https://github.com/fluent/fluent-operator/blob/master/apis/fluentbit/v1alpha2/clusterfluentbitconfig_types.go#L66

wenchajun avatar Apr 04 '22 05:04 wenchajun

This is my configmap @wenchajun apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterFluentBitConfig metadata: name: fluent-bit-config labels: app.kubernetes.io/name: fluent-bit spec: service: LogFile: parsersFile: parsers.conf inputSelector: matchLabels: fluentbit.fluent.io/enabled: "true" filterSelector: matchLabels: fluentbit.fluent.io/enabled: "true" outputSelector: matchLabels: fluentbit.fluent.io/enabled: "true"

2022/04/04 13:53:22] [ warn] [engine] chunk '15-1649080390.398710649.flb' cannot be retried: task_id=10, input=tail.0 > output=es.0 [2022/04/04 13:53:22] [error] [http_client] broken connection to elasticsearch-outils.lait.qc.ca:9200 ? [2022/04/04 13:53:22] [ warn] [output:es:es.0] http_do=-1 URI=/_bulk [2022/04/04 13:53:22] [ warn] [engine] chunk '15-1649080304.543997360.flb' cannot be retried: task_id=6, input=tail.0 > output=es.0

jayenzo avatar Apr 04 '22 13:04 jayenzo

You can debug it like this. You go inside a pod and see if you can access the host. If the network is normal, you can change the fluentbit image to kubesphere/fluent-bit:v1.7.3 and see if there are any corresponding errors.

wenchajun avatar Apr 04 '22 15:04 wenchajun

docker inspect 7bc4f5e76baf [ { "Id": "sha256:7bc4f5e76baf8b1470a8aa124f900dbdb262535189a703ec92f335e5150a45fa", "RepoTags": [ "kubesphere/fluent-bit:v1.8.11" ], "RepoDigests": [ "kubesphere/fluent-bit@sha256:b17510fd77513ffa8592637f61a35b1f0e8d348d1520f39df30580c41868c43f" ], "Parent": "", "Comment": "buildkit.dockerfile.v0", "Created": "2022-03-25T10:40:17.809581417Z", "Container": "", "ContainerConfig": { "Hostname": "", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": null, "Cmd": null, "Image": "", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": null }, "DockerVersion": "", "Author": "Eduardo Silva [email protected]", "Config": { "Hostname": "", "Domainname": "", "User": "0", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "ExposedPorts": { "2020/tcp": {} }, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" ], "Cmd": null, "Image": "", "Volumes": null, "WorkingDir": "/", "Entrypoint": [ "/fluent-bit/bin/fluent-bit-watcher" ], "OnBuild": null, "Labels": { "Description": "Fluent Bit docker image", "Vendor": "Fluent", "Version": "1.0" } },

unable to exec inside the container cmd null is there an alternative for troubleshooting the issue

jayenzo avatar Apr 04 '22 19:04 jayenzo

@wenchajun

kubectl exec -i -t -n kubesphere-logging-system fluentd-forward-0 -c fluentd -- sh -c "clear; (bash || ash || sh) /usr/bin $ nc -zv elasticsearch-tool.internal 9200 elasticsearch-tool.internal (192.168.34.202:9200) open

2022-04-04 20:36:16 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Could not communicate to Elasticsearch, resetting connection and trying again. EOFError (EOFError) 2022-04-04 20:36:16 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Remaining retry: 14. Retry to communicate after 2 second(s). 2022-04-04 20:36:20 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Could not communicate to Elasticsearch, resetting connection and trying again. EOFError (EOFError)

jayenzo avatar Apr 04 '22 21:04 jayenzo

@wenchajun

kubectl exec -i -t -n kubesphere-logging-system fluentd-forward-0 -c fluentd -- sh -c "clear; (bash || ash || sh) /usr/bin $ nc -zv elasticsearch-tool.internal 9200 elasticsearch-tool.internal (192.168.34.202:9200) open

2022-04-04 20:36:16 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Could not communicate to Elasticsearch, resetting connection and trying again. EOFError (EOFError) 2022-04-04 20:36:16 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Remaining retry: 14. Retry to communicate after 2 second(s). 2022-04-04 20:36:20 +0000 [warn]: #0 [FluentdConfig-kubesphere-logging-system-fluentd-config::kubesphere-logging-system::output::fluentd-stdout-0] Could not communicate to Elasticsearch, resetting connection and trying again. EOFError (EOFError)

This seems to show that it cannot connect

wenchajun avatar Apr 05 '22 08:04 wenchajun

I think you should make sure your elasticsearch tls setting is disabled.

Macrow avatar Mar 28 '23 16:03 Macrow