helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Could not create subdirectory "k8s_logs" inside of data dir "/vector-data-dir": Permission denied (os error 13)

Open arve0 opened this issue 2 years ago • 8 comments

Hi! I get the error message on start:

2022-11-07T14:01:32.136164Z ERROR vector::topology: Configuration error. error=Source "k8s_logs": Could not create subdirectory "k8s_logs" inside of data dir "/vector-data-dir": Permission denied (os error 13)

I use the following setup:

role: Agent

customConfig:
  data_dir: "/vector-data-dir"
  sources:
    k8s_logs:
      type: kubernetes_logs
  sinks:
    opensearch:
      type: elasticsearch
      endpoint: https://opensearch:9200
      inputs:
        - k8s_logs
      mode: bulk
      compression: none
      auth:
        strategy: basic
        user: xxxxx
        password: xxxxx
      tls:
        verify_certificate: false
        verify_hostname: false

I've tried adding an init container:

        - name: data-dir-permissions
          image: registry.access.redhat.com/ubi9
          command: ["bash", "-c", "set -x; id; ls -ld /vector-data-dir; chgrp -R 3000 /vector-data-dir; chmod g+rwx /vector-data-dir; ls -ld /vector-data-dir"]
          securityContext:
            privileged: true
          volumeMounts:
          - name: data
            mountPath: /vector-data-dir

and using uid/guid/fsuid 3000 in vector:

      containers:
        - name: vector
          image: "timberio/vector:0.24.1-distroless-libc"
          securityContext:
            runAsUser: 3000
            runAsGroup: 3000
            fsGroup: 3000

But it still fails. Debugging the container:

❯ oc debug vector-fffwp --image=ubi9
Starting pod/vector-fffwp-debug ...
Pod IP: 10.128.2.53
If you don't see a command prompt, try pressing enter.
sh-5.1$ id 
uid=3000(3000) gid=3000 groups=3000
sh-5.1$ ls -ld /vector-data-dir/
drwxrwxr-x. 2 root 3000 6 Nov  7 13:43 /vector-data-dir/
sh-5.1$ mkdir -p /vector-data-dir/a
mkdir: cannot create directory '/vector-data-dir/a': Permission denied

Any ideas?

arve0 avatar Nov 07 '22 14:11 arve0

Viewed from host, uid/gid seems correct:

❯ oc debug node/domstoltestocpin101
Starting pod/domstoltestocpin101-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.242.158.20
If you don't see a command prompt, try pressing enter.
sh-4.4# ls -ld /host/var/lib/vector
drwxrwxr-x. 2 3000 3000 6 Nov  7 13:43 /host/var/lib/vector

arve0 avatar Nov 07 '22 14:11 arve0

In my case, the error message is as below.

2023-01-11T06:56:35.919540Z ERROR vector::topology: Configuration error. error=Source "task_log": Could not create subdirectory "task_log" inside of data dir "/var/lib/vector/": Read-only file system (os error 30)

This because of PodSpec' volumeMount error. You can check your volumeMount if readOnly add or post your pod yaml.

Source Code from here

swartz-k avatar Jan 11 '23 07:01 swartz-k

Trying to reproduce this today with the following config (updated for latest Helm and Vector versions):

role: Agent

service:
  enabled: false
serviceHeadless:
  enabled: false

customConfig:
  data_dir: "/vector-data-dir"
  sources:
    k8s_logs:
      type: kubernetes_logs
  sinks:
    opensearch:
      type: elasticsearch
      endpoint: https://opensearch:9200
      inputs:
        - k8s_logs
      mode: bulk
      bulk:
        index: "vector-%Y.%m.%d"
      compression: none
      auth:
        strategy: basic
        user: xxxxx
        password: xxxxx
      tls:
        verify_certificate: false
        verify_hostname: false

I don't see any error when running locally on colima:

❯ kubectl logs pod/vector-6zh9r
2023-03-09T14:23:54.444835Z  INFO vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2023-03-09T14:23:54.448176Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-03-09T14:23:54.448602Z  INFO vector::app: Loading configs. paths=["/etc/vector"]
2023-03-09T14:23:54.499656Z  INFO source{component_kind="source" component_id=k8s_logs component_type=kubernetes_logs component_name=k8s_logs}: vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="colima"
2023-03-09T14:23:54.587269Z  INFO source{component_kind="source" component_id=k8s_logs component_type=kubernetes_logs component_name=k8s_logs}: vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["**/*.gz", "**/*.tmp"]
2023-03-09T14:23:54.589787Z  WARN vector::sinks::elasticsearch::common: DEPRECATION, use of deprecated option `endpoint`. Please use `endpoints` option instead.
2023-03-09T14:23:54.594123Z  WARN vector_core::tls::settings: The `verify_certificate` option is DISABLED, this may lead to security vulnerabilities.
2023-03-09T14:23:54.594898Z  WARN vector_core::tls::settings: The `verify_hostname` option is DISABLED, this may lead to security vulnerabilities.

I suspect this is due to restrictions imposed by OpenShift. Could you confirm you're still seeing this issue after upgrading to latest?

spencergilbert avatar Mar 09 '23 14:03 spencergilbert

I suspect this is due to restrictions imposed by OpenShift.

I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.

Would you like me to contribute back the SecurityContextConstraint under a flag, say openshift: true?

arve0 avatar Mar 09 '23 17:03 arve0

I suspect this is due to restrictions imposed by OpenShift.

I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.

Would you like me to contribute back the SecurityContextConstraint under a flag, say openshift: true?

That'd be great - I don't have too much experience with OpenShift, but if that's a normal/expected resource to create in OS clusters that seems good.

spencergilbert avatar Mar 09 '23 17:03 spencergilbert

I suspect this is due to restrictions imposed by OpenShift.

I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.

Would you like me to contribute back the SecurityContextConstraint under a flag, say openshift: true?

What was the fix? I tried with a custom privileged scc and for troubleshooting set runAsUser to 0 but I still get the permission errors.

Edit: I had to set privileged: true in the container security context for it to work.

Honken77 avatar Sep 20 '23 07:09 Honken77

Edit: I had to set privileged: true in the container security context for it to work.

Correct. I set it in values to chart:

securityContext:
  privileged: true

Then added SCC, Role and RoleBinding on the side:

# vector trenger priviligert tilgang for å skrive til /var/lib/vector på node.
# Kun initContainer bruker priviligert tilgang, vector-containeren kjøres som uid/guid 3000.
---
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: privileged-and-hostpath
  annotations:
    kubernetes.io/description: |
      Kopiert fra restricted. Har i tillegg allowHostDirVolumePlugin=true, volumes:hostpath
      og allowPrivilegedContainer=true.
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups: []
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- persistentVolumeClaim
- projected
- secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: use-privileged-and-hostpath
rules:
  - apiGroups:
      - security.openshift.io
    resources:
      - securitycontextconstraints
    verbs:
      - use
    resourceNames:
      - privileged-and-hostpath
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: vector-can-use-privileged-and-hostpath
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: use-privileged-and-hostpath
subjects:
  - kind: ServiceAccount
    name: vector

I tried using SecurityContextConstraints.allowedCapabilities without allowPrivilegedContainer, but never got that working. Found that openshift-logging also uses allowPrivilegedContainer, so settled with that.

arve0 avatar Sep 21 '23 09:09 arve0

Hi,

Try to avoid setting privileged: true, because it is basically giving the vector pod root access to the underlying host.

Configure your scc to this again and remove privileged: true:

allowPrivilegeEscalation: false
allowPrivilegedContainer: false

Then add this in your daemonset:

      - op: add
        path: "/spec/template/spec/containers/0/securityContext"
        value:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - CHOWN
            drop:
            - KILL
            - DAC_OVERRIDE
            - FOWNER
            - NET_BIND_SERVICE
            - FSETID
            - SETGID
            - SETUID
            - SETPCAP
          privileged: false
          seLinuxOptions:
            type: container_logwriter_t
          seccompProfile:
            type: RuntimeDefault

and I would suggest applying this MachineConfig to the nodes where vector is running(with me it is on all my worker nodes):

variant: openshift
version: 4.14.0
metadata:
  name: 50-selinux-file-contexts-local
  labels:
    machineconfiguration.openshift.io/role: worker
storage:
  files:
    - path: /etc/selinux/targeted/contexts/files/file_contexts.local
      mode: 0644
      overwrite: true
      contents:
        inline: |
          /var/lib/vector(/.*)?    system_u:object_r:container_file_t:s0
systemd:
      units:
        - contents: |-
            [Unit]
            Description=Set local SELinux file context for vector

            [Service]
            ExecStart=/bin/bash -c '/usr/bin/mkdir -p /var/lib/vector;restorecon -Rv /var/lib/vector'
            RemainAfterExit=yes
            Type=oneshot

            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: set-SELinux-context-local.service

jonasbartho avatar Jan 24 '24 18:01 jonasbartho