fluent-operator icon indicating copy to clipboard operation
fluent-operator copied to clipboard

help request: Why are cluster-wide permission required for some resources such as DaemonSets, Secrets & more?

Open boatski opened this issue 1 year ago • 7 comments

Describe the issue

I am looking to rollout Fluent Operator in an enterprise Kubernetes cluster to handle various telemetry needs for 100+ services. The configuration related CRDs look to allow our service owners to extend our telemetry pipeline for their own needs, such as additional outputs. That's the main reason Fluent Operator stood out for me.

Security is a concern with full cluster-wide permissions given on DaemonSets, StatefulSets, Secrets, ServiceAccounts, and more. Given that these are namespace scoped what functionality does the operator provide that requires cluster-wide permissions on these resources?

I've attempted to reduce the permission scope on these resources to a single namespace using Roles/RoleBindings, but the operator attempts to list/watch these resources across the entire cluster.

For example, I moved permissions below from the ClusterRole to a namespaced Role (with the appropriate bindings). If the operator exists in a single namespace, as well as either FluentBit or Fluentd, then I would not expect that it needs to monitor the entire cluster for these resources.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: telemetry
  name: fluent-operator
rules:
  - apiGroups:
      - apps
    resources:
      - daemonsets
      - statefulsets
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterrolebindings
    verbs:
      - create
      - list
      - get
      - watch
      - patch
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
    verbs:
      - create
      - list
      - get
      - watch
      - patch
  - apiGroups:
      - ""
    resources:
      - secrets
      - configmaps
      - services
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch

The logs below are from the operator once those permissions were scoped to a single namespace.

fluent-operator W0802 20:15:54.123352       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.DaemonSet: daemonsets.apps is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "daemonsets" in API group "apps" at the cluster scope
fluent-operator E0802 20:15:54.123404       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.DaemonSet: failed to list *v1.DaemonSet: daemonsets.apps is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "daemonsets" in API group "apps" at the cluster scope
fluent-operator W0802 20:16:07.306561       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "services" in API group "" at the cluster scope
fluent-operator E0802 20:16:07.306618       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "services" in API group "" at the cluster scope
fluent-operator W0802 20:16:10.656780       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "secrets" in API group "" at the cluster scope
fluent-operator E0802 20:16:10.656875       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "secrets" in API group "" at the cluster scope
fluent-operator W0802 20:16:19.079865       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "statefulsets" in API group "apps" at the cluster scope
fluent-operator E0802 20:16:19.079921       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "statefulsets" in API group "apps" at the cluster scope
fluent-operator W0802 20:16:23.153158       1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "serviceaccounts" in API group "" at the cluster scope
fluent-operator E0802 20:16:23.153227       1 reflector.go:140] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.ServiceAccount: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:telemetry:fluent-operator" cannot list resource "serviceaccounts" in API group "" at the cluster scope

Is it possible to configure RBAC for Fluent Operator such that it doesn't need cluster-wide permissions on these resources while continuing to allow telemetry to be collected from all namespaces? If not, what functionality does the operator provide that requires such permissions?

How did you install fluent operator?

helm upgrade --install fluent-operator --create-namespace -n telemetry <chart_path>

Additional context

No response

boatski avatar Aug 02 '24 21:08 boatski

That's a good idea. We first set it to the cluster level because essentially fluentbit is going to collect logs from all namespaces, tweak it for custom namespaces?

wenchajun avatar Aug 06 '24 03:08 wenchajun

The operator shouldn't need full cluster scope access to daemonsets/statefulsets to collect logs from all namespaces, unless there's something I'm misunderstanding? It would only need those permissions in the namespace fluentbit/fluentd are deployed to. I can see that the operator does need cluster scope access for some plugins to collect from all namespaces so that it can grant those permissions to fluentbit/fluentd, such as the k8s plugin to get pod info.

Are there any use cases I'm maybe not considering that would require cluster scope permission on daemonsets/statefulsets from the operator?

boatski avatar Aug 12 '24 16:08 boatski

Is there a work-around or solution for this issue? I'm encountering the same issue.

sherwoodzern avatar Oct 16 '24 01:10 sherwoodzern

@boatski may I know if there is any update/solution to narrow down the minimum permission that fluent-operator needs?

Besides Fluent Operator, if I set the rbacrules in FluentBit as default, which might not be enough to collect the log files, I am setting it the same as Operator's, which is working smoothly, but I am sure that is over-granted.

Could you please help review this issue as it does bring the attention of our security team and blocks the current onboarding procedure? @cw-Guo

duj4 avatar Mar 17 '25 06:03 duj4

yeah, we should definitely review the namespace related set-ups for fluent-operator.

we should support namespace scoped install and also cluster install.

But this change is not trivial at all.

cw-Guo avatar Mar 17 '25 23:03 cw-Guo

yeah, we should definitely review the namespace related set-ups for fluent-operator.

we should support namespace scoped install and also cluster install.

But this change is not trivial at all.

Thanks for the reply @cw-Guo , I am afraid I have to do some shrink locally to get the minimum required clusterroles if this is not on your priority list :(

duj4 avatar Mar 18 '25 13:03 duj4

Any movement on this request? We're seeing much higher memory usage by the fluent-operator in larger clusters where there are a significant number of secrets/configmaps to the point that we're having to double and even triple or more the memory limits on the operator due to OOMing. Aside from better memory management/garbage collection to keep this in check, it would seem like using a namespace scoped RBAC for the operator would help keep this overhead low by limiting the number of resources being cached in such scenarios.

ak185158 avatar Nov 05 '25 14:11 ak185158