logging-operator icon indicating copy to clipboard operation
logging-operator copied to clipboard

Any new version from 4.5.1 doesn't attach extraVolumes in fluentd

Open frit0-rb opened this issue 1 year ago • 5 comments
trafficstars

Bugs should be filed for issues encountered whilst operating logging-operator. You should first attempt to resolve your issues through the community support channels, e.g. Slack, in order to rule out individual configuration errors. #logging-operator Please provide as much detail as possible.

Describe the bug: A clear and concise description of what the bug is.

I have a logging-operator deployed in a kubernetes Cluster RKE2 with version 4.5.1, when I try to update for a new version like 4.5.3 or 4.5.6 the logs stored in the fluentd never sent to Splunk

Expected behaviour: A concise description of what you expected to happen.

The logs sent to Splunk

Steps to reproduce the bug: Steps to reproduce the bug should be clear and easily reproducible to help people gain an understanding of the problem.

Additional context: Add any other context about the problem here.

Environment details:

  • Kubernetes version (e.g. v1.15.2): RKE2 1.26.11

  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc):

  • logging-operator version (e.g. 2.1.1): 4.5.1

  • Install method (e.g. helm or static manifests): helm

  • Logs from the misbehaving component (and any other relevant logs):

    2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1580:in do_start' 2024-04-26T15:41:19.729694776+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1575:in start' 2024-04-26T15:41:19.729700576+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:662:in start' 2024-04-26T15:41:19.729705876+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:602:in connection_for' 2024-04-26T15:41:19.729711176+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:892:in request' 2024-04-26T15:41:19.729717176+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:351:in write_to_splunk' 2024-04-26T15:41:19.729722476+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:103:in block in write' 2024-04-26T15:41:19.729727576+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/benchmark.rb:311:in realtime' 2024-04-26T15:41:19.729732776+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:102:in write' 2024-04-26T15:41:19.729738076+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:154:in write' 2024-04-26T15:41:19.729763375+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in try_flush' 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:in flush_thread_run' 2024-04-26T15:41:19.729773975+02:00 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in block (2 levels) in start' 2024-04-26 13:41:19 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in block in thread_create'

  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:

/kind bug

frit0-rb avatar Apr 26 '24 13:04 frit0-rb

More info:

failed to flush the buffer. retry_times=3 next_retry_time=2024-04-26 14:57:56 +0000 chunk="61700e0f9e5790e5efb53ae6d92b1e5f" error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib"

frit0-rb avatar Apr 26 '24 15:04 frit0-rb

I tried to update from 4.5.2 to 4.5.6, when its done and when I see the logs in the fluentd pod I see this error:

error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib"

This is the log:

2024-04-30 09:49:49 +0000 [warn]: #0 [flow:gitlab:gitlab-to-splunk:output:gitlab:splunk-gitlab-dev] failed to flush the buffer. retry_times=9 next_retry_time=2024-04-30 09:58:20 +0000 chunk="6174d053bd6f5921236fadd5329cdb94" error_class=OpenSSL::SSL::SSLError error="SSL_CTX_load_verify_file: system lib" 2024-04-30T11:49:49.169562308+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:in initialize' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:in new' 2024-04-30T11:49:49.169585608+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1666:in connect' 2024-04-30T11:49:49.169600808+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1580:in do_start' 2024-04-30T11:49:49.169608108+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/net/http.rb:1575:in start' 2024-04-30T11:49:49.169615008+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:662:in start' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:602:in connection_for' 2024-04-30T11:49:49.169627208+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-4.0.2/lib/net/http/persistent.rb:892:in request' 2024-04-30T11:49:49.169659408+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:351:in write_to_splunk' 2024-04-30T11:49:49.169682307+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:103:in block in write' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/lib/ruby/3.2.0/benchmark.rb:311:in realtime' 2024-04-30T11:49:49.169695207+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk.rb:102:in write' 2024-04-30T11:49:49.169701407+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.3.3/lib/fluent/plugin/out_splunk_hec.rb:154:in write' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in try_flush' 2024-04-30T11:49:49.169713607+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:in flush_thread_run' 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in block (2 levels) in start' 2024-04-30T11:49:49.169726507+02:00 2024-04-30 09:49:49 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in block in thread_create'`

I checked the releases notes but any change looks affect to SSL or something else

frit0-rb avatar Apr 30 '24 09:04 frit0-rb

Good morning,

I got the main problem.

In the actual definition of the Logging I got a extraVolume def created to parse CAs from node host workers:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: &logging-app-dev gitlab-logging-dev
  namespace: cattle-logging-system
spec:
  loggingRef: *logging-app-dev
  fluentbit:
    security:
      roleBasedAccessControlCreate: true
  fluentd:
    security:
      roleBasedAccessControlCreate: true
      podSecurityContext:
        runAsNonRoot: false
    scaling:
      replicas: 3
    bufferStorageVolume:
      pvc:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
    extraVolumes:
      - volumeName: trusted-cas-volume
        path: /home/fluent/certs
        containerName: fluentd
        volume:
          hostPath:
            path: /etc/pki/ca-trust/source/anchors
  controlNamespace: cattle-logging-system
  watchNamespaces:
    - gitlab`

But when the logging is created this extra volume never create inside fluentd pods.

With the same config in 4.5.1 the extraVolume was created well

frit0-rb avatar May 03 '24 07:05 frit0-rb

I tried to add via FluentdConfig via extraVolumes a hostPath or a Secret and I got the same problem

frit0-rb avatar May 03 '24 13:05 frit0-rb

@frit0-rb can you please use fenced code blocks so that we can see whitespaces as well?

pepov avatar May 03 '24 15:05 pepov

@frit0-rb can you please use fenced code blocks so that we can see whitespaces as well?

Sorry @pepov , I added the fenced

frit0-rb avatar May 06 '24 07:05 frit0-rb

thx, I've started to look into this, but I have some conflicting priorities, I have to ask for your patience

pepov avatar May 13 '24 12:05 pepov

thx, I've started to look into this, but I have some conflicting priorities, I have to ask for your patience

No problem @pepov , we are not fare away from the las stable update, so take it easy

frit0-rb avatar May 14 '24 06:05 frit0-rb

Hello @pepov the a new CVE from fluentbit https://thehackernews.com/2024/05/linguistic-lumberjack-vulnerability.html So I need to resolve the problem as soon as possible because I need to update to 4.6.0

frit0-rb avatar May 21 '24 07:05 frit0-rb

You can use the latest fluentbit anytime without upgrading logging operator by setting the fluentbit image version explocitly

pepov avatar May 21 '24 13:05 pepov

Looking at the code of statefulset.go it seems both Volume and PersistentVolumeClaim must be specified. Not sure why. Also, support for mounting secrets or configmaps is not supported at all. See https://github.com/kube-logging/logging-operator/blob/61e6eb05c56c393cd929d96e66e4c39f346c4882/pkg/resources/fluentd/statefulset.go#L53 These lines were changed 5 months ago.

mgalesloot avatar Jun 24 '24 13:06 mgalesloot

thx @mgalesloot ! can someone help me verify this fixes the issue? https://github.com/kube-logging/logging-operator/pull/1765

Also this one from @nak0f (coming soon) will extend the support for configmaps: https://github.com/cisco-open/operator-tools/pull/251

pepov avatar Jun 30 '24 07:06 pepov

fyi I've updated the above PR with a sample that seems to fix this issue as I would expect

pepov avatar Jun 30 '24 13:06 pepov

Hi @pepov closed this issue means the issue is solved for what version? What version I need to update to use extravolumes?

frit0-rb avatar Jul 01 '24 11:07 frit0-rb

In the next upcoming version which is going to be 4.8

pepov avatar Jul 01 '24 11:07 pepov