fluent-operator icon indicating copy to clipboard operation
fluent-operator copied to clipboard

Can FluentBit CRDs be namespaced?

Open alternaivan opened this issue 2 years ago • 18 comments

Describe the issue

Hello,

I'm trying to deploy ClusterOutput and ClusterParser to the specific namespace where the FluentBit Operator is deployed, the resources are being deployed, however, on the cluster level and without the namespace.

Is it possible to deploy namespaced CRDs or not? The documentation says that the CRDs are cluster level which means no I guess, however, graph below the documentation shows the namespace value. In case we cannot deploy the namespaced CRDs, the graph could be a bit misleading.

Documentation I'm referring is this.

Thanks in advance!

How did you install fluent operator?

Installation was done via Helm.

Additional context

No response

alternaivan avatar Jan 23 '23 11:01 alternaivan

@alternaivan FluentBit is namespaced CRD which controls the namespaced Fluent Bit Daemonset. ClusterInput, ClusterParser, ClusterFilter, and ClusterOutput are all cluster-wide CRDs, this is because FluentBit is acting as a global agent to collect logs on each K8s node which requires cluster-wide privileges.

We'll remove the namespace in the following graph: https://github.com/fluent/fluent-operator/blob/master/docs/images/fluent-bit-operator-workflow.svg

benjaminhuo avatar Jan 23 '23 12:01 benjaminhuo

We now have namespaced FluentBit CRDs in the operator starting v2.2.0.

adiforluls avatar May 12 '23 09:05 adiforluls

Hi @adiforluls,

Thanks for the update. I've tested it on my local cluster, and although the CRDs such as Output are now namespaced it seems that the kind: Output is not working. I've tested it in parallel with both kind: ClusterOutput and kind: Output, the former is working, but the latter isn't. Am I missing something?

Below is both Output and ClusterOutput definitions.

apiVersion: fluentbit.fluent.io/v1alpha2
kind: Output
metadata:
  labels:
    fluentbit.fluent.io/enabled: "true"
  name: es
  namespace: fluent
spec:
  es:
    bufferSize: 25M
    generateID: true
    host: "elasticsearch"
    index: fluent-bit-index
    logstashFormat: false
    port: 9200
    replaceDots: true
    timeKey: '@timestamp'
  matchRegex: (?:service)\.(.*)

...

apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
  labels:
    fluentbit.fluent.io/enabled: "true"
  name: es
spec:
  es:
    bufferSize: 25M
    generateID: true
    host: "elasticsearch"
    index: fluent-bit-index
    logstashFormat: false
    port: 9200
    replaceDots: true
    timeKey: '@timestamp'
  matchRegex: (?:service)\.(.*)

Thanks, Marjan

alternaivan avatar May 22 '23 08:05 alternaivan

Hi @adiforluls,

There was an error in my configuration. I was missing the FluentBitConfig on a namespace-level that matches the labels defined in the Output.

Sorry for the misunderstanding.

Thanks, Marjan

alternaivan avatar May 22 '23 12:05 alternaivan

@adiforluls how do the namespaced FluentBit resources actually work? If I create a FluentBit resource in a namespace, shouldn't fluentbit pods be deployed? Currently I'm facing an issues that the pods get created only in the same namespace where operator is running. If there is some configuration thing, that is required to run the deamonset in another namespace please let me know.

nemcikjan avatar May 29 '23 15:05 nemcikjan

The FluentBit resource is a namespaced resource that dictates various configurations of the fluent-bit daemonset. The daemonset instances on every node of the cluster will be created in the same namespace as the FluentBit custom resource. This resource doesn't offer any namespace level log isolation/treatment.

There is a FluentBitConfig resource where you can specify label selector values for namespaced Filter/Parser/Output in the same namespace as FluentBitConfig resource.

FluentBit resource has a label selector field called namespaceFluentBitCfgSelector to match FluentBitConfig resources with the respective label from various namespace in the clusters.

adiforluls avatar Jun 07 '23 06:06 adiforluls

Hi, Apologies in advance if I missed something obvious but we have been running fluent bit operator and fluent bit via helm 2.0.0 (fluent 1.8.3) fine (collecting specific entries from var/messages) However I have stepped through the upgrade to find out if the namespaced CRD's have introduced breaking changes and its not mentioned anywhere any pre-reqs before upgrading to 2.2 and above...

  • 2.0.0 -> 2.01 = fine
  • 2.0.1 -> 2.1.0 = fine
  • 2.1.0 -> 2.2.0 = broken!

The operator is complaining it cannot find CRD FluentBitConfig ? We have only ever used CLusterFluentBitConfig.... Please can anyone advise what needs to be implemented configured prior to upgrading to 2.2.0 to ensure a working fluent bit operator....

2023-09-07T16:28:11Z ERROR controller-runtime.source if kind is a CRD, it should be installed before calling Start {"kind": "FluentBitConfig.fluentbit.fluent.io", "error": "no matches for kind "FluentBitConfig" in version "fluentbit.fluent.io/v1alpha2""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:143 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:235 k8s.io/apimachinery/pkg/util/wait.WaitForWithContext /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:662 k8s.io/apimachinery/pkg/util/wait.poll /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:596 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:547 sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:136

image

This CRD is new in 2.2 image

mkocaks avatar Sep 07 '23 16:09 mkocaks

The operator is complaining it cannot find CRD FluentBitConfig ? We have only ever used CLusterFluentBitConfig.... Please can anyone advise what needs to be implemented configured prior to upgrading to 2.2.0 to ensure a working fluent bit operator....

You can open a new issue to discuss. You can get the crd on the server and observe if that crd exists.

wenchajun avatar Sep 08 '23 08:09 wenchajun

Could you also share your values.yaml file

adiforluls avatar Sep 08 '23 09:09 adiforluls

Thank you here is our values - just simple - we deploy operator via helm terraform operator and the others (input, output, parsr, config for fluent bit) is done via terraform kubectl manifest provider...

Omitted some entries with xxxxx

containerRuntime: containerd Kubernetes: false

operator: resources: limits: cpu: 200m memory: 200Mi requests: cpu: 100m memory: 60Mi

Below is the yaml for fluentbit:

apiVersion: fluentbit.fluent.io/v1alpha2 kind: FluentBit metadata: name: fluent-bit labels: app.kubernetes.io/name: fluent-bit spec: labels: app.kubernetes.io/name: fluent-bit image: kubesphere/fluent-bit:v${version} positionDB: hostPath: path: /var/lib/fluent-bit/ resources: requests: cpu: 10m memory: 25Mi #limits: # cpu: 500m # memory: 200Mi fluentBitConfigName: fluentbit-config tolerations:

  • key: xxxxxx operator: Equal value: e4sv4 effect: NoSchedule
  • key: xxxxx operator: Equal value: e8sv4 effect: NoSchedule
  • key: xxxxxx operator: Equal value: e16sv4 effect: NoSchedule

mkocaks avatar Sep 08 '23 12:09 mkocaks

The above was working fine up until upgrading to 2.2 as mentioned above

mkocaks avatar Sep 08 '23 12:09 mkocaks

Hi Any other advise here - there is breaking changes in 2.2.0 and above... Which ends up in the operator throwing this error and looks liek a bug in the CRD

if kind is a CRD, it should be installed before calling Start {"kind": "FluentBitConfig.fluentbit.fluent.io", "error": "no matches for kind "FluentBitConfig" in version "fluentbit.fluent.io/v1alpha2""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1

Could not wait for Cache to sync {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "error": "failed to wait for fluentbit caches to sync: timed out waiting for cache to be synced"

mkocaks avatar Sep 20 '23 10:09 mkocaks

Hi, Apologies in advance if I missed something obvious but we have been running fluent bit operator and fluent bit via helm 2.0.0 (fluent 1.8.3) fine (collecting specific entries from var/messages) However I have stepped through the upgrade to find out if the namespaced CRD's have introduced breaking changes and its not mentioned anywhere any pre-reqs before upgrading to 2.2 and above..

It seems to me that it may be because of the addition of this feature in version 2.2. https://github.com/fluent/fluent-operator/pull/621

wenchajun avatar Sep 20 '23 10:09 wenchajun

It is! As we only use cluster fluent bit config - thus will try and update the CRD accouring to this manual https://github.com/fluent/fluent-operator#deploy-fluent-operator-with-helm I will feedback...

mkocaks avatar Sep 20 '23 10:09 mkocaks

Thank you that was the issue...had to replace all CRD's !

mkocaks avatar Sep 20 '23 11:09 mkocaks

@wenchajun @benjaminhuo looks like the enable crds feature has broken upgrade of fluent-operator. I too experienced this, simple helm upgrade does not update the CRDs anymore, helm install has no problems though (i.e. CRDs are applied).

adiforluls avatar Oct 11 '23 15:10 adiforluls

@wenchajun @benjaminhuo looks like the enable crds feature has broken upgrade of fluent-operator. I too experienced this, simple helm upgrade does not update the CRDs anymore, helm install has no problems though (i.e. CRDs are applied).

@Kristian-ZH add this great enhancement in https://github.com/fluent/fluent-operator/pull/621. It looks like we need some adjustments here.

benjaminhuo avatar Oct 11 '23 15:10 benjaminhuo

Workaround at the moment is just to manually update CRDs I guess (worked for me)

mkocaks avatar Oct 11 '23 16:10 mkocaks