fluent-bit When ingestion endpoint is not reachable : health endpoint should return 5xx HTTP error.

$kubectl version
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5+IKS

Fluent Bit v3.1.4-ibm
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io/

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

Registering the logger-agent-plugin CommitSHA: e3e664b3cde6cd9f120036d6767cd0717f546b12
Registering the logger-icl-output-plugin with commitSHA: c257a37dc8119d8906e1be191998ed8d4a4beb3c
[2024/10/14 13:52:52] [ info] [fluent bit] version=3.1.4-ibm, commit=, pid=1
[2024/10/14 13:52:52] [ info] [storage] ver=1.5.2, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=192
[2024/10/14 13:52:52] [ info] [storage] backlog input plugin: storage_backlog.1
[2024/10/14 13:52:52] [ info] [cmetrics] version=0.9.1
[2024/10/14 13:52:52] [ info] [ctraces ] version=0.5.2

I have the fluentbit deamon set running in my K8s and I can enter the logging pod and see:

bash-5.1$ ps -Af
UID         PID   PPID  C STIME TTY          TIME CMD
10000         1      0  1 Oct14 ?        00:28:54 /fluent-bit/bin/fluent-bit --config=/fluent-bit/etc/fluent-bit.conf
10000        33      0  0 14:34 pts/0    00:00:00 /bin/bash
10000        44     33  0 14:34 pts/0    00:00:00 ps -Af

K8s pod config:

        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v1/health/
            port: 8081
            scheme: HTTP

yet the configuration is bad or firewall blocks the ingestion point so I get bad readiness. If I ssh into the POD:

bash-5.1$ curl localhost:8081/api/v1/health
curl: (7) Failed to connect to localhost port 8081: Connection refused

this is misleading response.

if the process is up it should return 500 or alike and not Connection refused for that health endpoint. possible to add a an HTTP reason header or log line about the true nature of config issue.

connection refused is for severe cases where process fails to start due to null pointer exception or process crashing due to OOM.

Oct 15 '24 14:10 taitelman

Please follow the template and provide all the relevant details required including config, version, environment, etc.?

I presume you're using this? https://docs.fluentbit.io/manual/administration/monitoring#health-check-for-fluent-bit

Oct 16 '24 10:10 patrick-stephens

based on fluentbit documentaiton the health point should: The health endpoint returns an HTTP status 500 and an error message. Otherwise, the endpoint returns HTTP status 200 and an ok message.

Oct 19 '24 06:10 taitelman

deamon set:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    version: 1.3.1
  creationTimestamp: "2024-03-17T11:24:21Z"
  generation: 57
  labels:
    app: logger-agent-ds
    version: 1.3.1
  name: logger-agent-ds
  namespace: ibm-observe
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: logger-agent-ds
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2024-09-12T11:13:58Z"
      creationTimestamp: null
      labels:
        app: logger-agent-ds
        name: logger-agent-ds
        version: 1.3.1
    spec:
      containers:
      - args:
        - --config=/fluent-bit/etc/fluent-bit.conf
        command:
        - /fluent-bit/bin/fluent-bit
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: observe/logs-router-agent:1.3.1
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v1/health/
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 1
        name: fluent-bit
        ports:
        - containerPort: 2020
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v1/health/
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 701m
            ephemeral-storage: 10Gi
            memory: 3Gi
          requests:
            cpu: 100m
            ephemeral-storage: 2Gi
            memory: 1Gi
        securityContext:
          capabilities:
            add:
            - DAC_READ_SEARCH
          privileged: false
          runAsGroup: 10000
          runAsUser: 10000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/secrets/tokens
          name: vault-token
        - mountPath: /var/log
          name: varlog
          readOnly: true
        - mountPath: /var/data
          name: vardata
          readOnly: true
        - mountPath: /var/log/fluent-bit
          name: varlogfluentbit
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
          readOnly: true
        - mountPath: /fluent-bit/etc/
          name: logger-agent-config
        - mountPath: /fluent-bit/cache
          name: fluent-bit-cache
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: all-icr-io
      initContainers:
      - command:
        - scripts/make_db_dir.sh
        image: observe/logs-router-agent-init:1.3.1
        imagePullPolicy: Always
        name: create-db-dir
        resources: {}
        securityContext:
          privileged: true
          runAsGroup: 0
          runAsUser: 0
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: varlog
        - mountPath: /var/data
          name: vardata
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: logger-agent-sa
      serviceAccountName: logger-agent-sa
      terminationGracePeriodSeconds: 10
      tolerations:
      - operator: Exists
      volumes:
      - name: vault-token
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              audience: iam
              expirationSeconds: 7200
              path: vault-token
      - hostPath:
          path: /var/log
          type: ""
        name: varlog
      - hostPath:
          path: /var/data
          type: ""
        name: vardata
      - hostPath:
          path: /var/log/fluent-bit
          type: ""
        name: varlogfluentbit
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: varlibdockercontainers
      - configMap:
          defaultMode: 420
          name: logger-agent-config
        name: logger-agent-config
      - emptyDir:
          sizeLimit: 11Gi
        name: fluent-bit-cache
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

Oct 19 '24 06:10 taitelman

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Jan 19 '25 02:01 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Feb 01 '25 02:02 github-actions[bot]

fluent-bit fluent-bit copied to clipboard

When ingestion endpoint is not reachable : health endpoint should return 5xx HTTP error.

fluent-bit
fluent-bit copied to clipboard