airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Scheduder crashed with error - "PermissionError: [Errno 13] Permission denied " even with extraInitContainers and PVC.

Open ravilk opened this issue 1 year ago • 3 comments

Official Helm Chart version

1.13.1 (latest released)

Apache Airflow version

2.9.0

Kubernetes Version

v1.28

Helm Chart configuration


executor: "KubernetesExecutor"

enableBuiltInSecretEnvVars:
  AIRFLOW__CORE__FERNET_KEY: true
  AIRFLOW__CORE__SQL_ALCHEMY_CONN: true
  AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: true
  AIRFLOW_CONN_AIRFLOW_DB: true
  AIRFLOW__WEBSERVER__SECRET_KEY: true
  AIRFLOW__CELERY__CELERY_RESULT_BACKEND: true
  AIRFLOW__CELERY__RESULT_BACKEND: true
  AIRFLOW__CELERY__BROKER_URL: true
  AIRFLOW__ELASTICSEARCH__HOST: true
  AIRFLOW__ELASTICSEARCH__ELASTICSEARCH_HOST: true

###### Enable external RDS for Airflow Metadata ######
  metadataSecretName: airflow-rds-db
  resultBackendSecretName: airflow-rds-db

###### Disable Kubernetes based Postgres for Airflow Metadata ######
  postgresql:
   enabled: false
   auth:
     enablePostgresUser: true
     postgresPassword: postgres
     username: ""
     password: ""

########### Scheduler ########### 
scheduler:
  enabled: true
  
  extraContainers: []
  extraInitContainers:
  - name: fix-volume-logs-permissions
    image: busybox
    command: [ "sh", "-c", "chown -R 50000:0 /opt/airflow/logs/" ]
    securityContext:
      runAsUser: 0
    volumeMounts:
      - mountPath: /opt/airflow/logs/
        name: logs

###### Airflow.cfg #######
config:
  core:
    dags_folder: '{{ include "airflow_dags" . }}'
    load_examples: 'False'
    executor: '{{ .Values.executor }}'
    colored_console_log: 'False'
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
  logging:
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
    colored_console_log: 'False'
  metrics:
    statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}'
    statsd_port: 9125
    statsd_prefix: airflow
    statsd_host: '{{ printf "%s-statsd" (include "airflow.fullname" .) }}'
  webserver:
    enable_proxy_fix: 'True'
    airflow__webserver__base_url: "https://airflow.bizapps-dev.purestorage.com"
    rbac: 'True'
  celery:
    flower_url_prefix: '{{ ternary "" .Values.ingress.flower.path (eq .Values.ingress.flower.path "/") }}'
    worker_concurrency: 16
  scheduler:
    standalone_dag_processor: '{{ ternary "True" "False" .Values.dagProcessor.enabled }}'
    statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}'
    statsd_port: 9125
    statsd_prefix: airflow
    statsd_host: '{{ printf "%s-statsd" (include "airflow.fullname" .) }}'
    run_duration: 41460
  elasticsearch:
    json_format: 'True'
    log_id_template: "{dag_id}_{task_id}_{execution_date}_{try_number}"
  elasticsearch_configs:
    max_retries: 3
    timeout: 30
    retry_timeout: 'True'
  kerberos:
    keytab: '{{ .Values.kerberos.keytabPath }}'
    reinit_frequency: '{{ .Values.kerberos.reinitFrequency }}'
    principal: '{{ .Values.kerberos.principal }}'
    ccache: '{{ .Values.kerberos.ccacheMountPath }}/{{ .Values.kerberos.ccacheFileName }}'
  celery_kubernetes_executor:
    kubernetes_queue: 'kubernetes'
  kubernetes:
    namespace: '{{ .Release.Namespace }}'
    airflow_configmap: '{{ include "airflow_config" . }}'
    airflow_local_settings_configmap: '{{ include "airflow_config" . }}'
    pod_template_file: '{{ include "airflow_pod_template_file" . }}/pod_template_file.yaml'
    worker_container_repository: '{{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}'
    worker_container_tag: '{{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}'
    multi_namespace_mode: '{{ ternary "True" "False" .Values.multiNamespaceMode }}'
  kubernetes_executor:
    namespace: '{{ .Release.Namespace }}'
    pod_template_file: '{{ include "airflow_pod_template_file" . }}/pod_template_file.yaml'
    worker_container_repository: '{{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}'
    worker_container_tag: '{{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}'
    multi_namespace_mode: '{{ ternary "True" "False" .Values.multiNamespaceMode }}'
  triggerer:
    default_capacity: 1000


########### DAG ########### 
dags:
  # Where dags volume will be mounted. Works for both persistence and gitSync.
  # If not specified, dags mount path will be set to $AIRFLOW_HOME/dags
  mountPath: ~
  persistence:
    annotations: {}
    enabled: false
    size: 1Gi
    storageClassName:
    accessMode: ReadWriteOnce
    existingClaim:
    subPath: ~
  gitSync:
    enabled: true
    repo: https://github.com/....
    branch: main
    rev: HEAD
    ref: main
    depth: 1
    maxFailures: 0
    subPath: "dags"
    credentialsSecret: git-credentials


logs:
  # Configuration for empty dir volume (if logs.persistence.enabled == false)
  # emptyDirConfig:
  #   sizeLimit: 1Gi
  #   medium: Memory

  persistence:
    enabled: true
    size: 80Gi
    annotations: {}
    storageClassName: 
    existingClaim: dna-airflow-logs

----------------------------------------------------------------------
####### Airflow logs PVC ####### 

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    meta.helm.sh/release-name: dna-airflow
    meta.helm.sh/release-namespace: airflow
    volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
    volume.kubernetes.io/selected-node: ip-172-18-231-89.us-west-2.compute.internal
    volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
  creationTimestamp: "2024-04-10T20:51:02Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/managed-by: Helm
    chart: airflow-1.13.1
    component: logs-pvc
    heritage: Helm
    release: dna-airflow
    tier: airflow
  name: dna-airflow-logs
  namespace: airflow
  resourceVersion: "123093957"
  uid: 4d24054d-5e6a-49bf-ba8c-79f9ea9298a3
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: gp3
  volumeMode: Filesystem


Docker Image customizations

Custom Docker url - ravilkhalilov/airflow-demo:2.0
digest: sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235

Docker file content:

FROM apache/airflow:2.9.0

RUN pip install apache-airflow-providers-tableau snowflake-connector-python snowflake-sqlalchemy apache-airflow-providers-snowflake pendulum


What happened

Hi,

I am my using custom docker image based on the official docker image, with the latest version of Airflow - 2.9.0. I'm able to deploy Airflow using the official helm chart on AWS EKS.

But after a while, my scheduler just keeps restarting in a loop. Then I found that the issue was that scheduler-log-groom was missing permission on the ‘/opt/airflow/logs’ folder.

I updated my values.yaml file with extraInitContainers(spec attached below) as showed in the scheduler section if the values.yaml

But, After upgradging chart I still receive scheduler errors in the logs. Now I see that Livenes Probe is not able to access to the "/opt/airflow/logs/scheduler" folder.

What you think should happen instead

Name:             dna-airflow-scheduler-5cc8cfd8f6-hx2bl
Namespace:        airflow
Priority:         0
Service Account:  dna-airflow-scheduler
Node:             ip-172-18-231-89.us-west-2.compute.internal/172.18.231.89
Start Time:       Fri, 12 Apr 2024 22:49:41 +0200
Labels:           component=scheduler
                  pod-template-hash=5cc8cfd8f6
                  release=dna-airflow
                  tier=airflow
Annotations:      checksum/airflow-config: 6fb676fa1295f9e8afd5408033a62ecaf465ddd5339dff805ffbcf8e653848dc
                  checksum/extra-configmaps: e862ea47e13e634cf17d476323784fa27dac20015550c230953b526182f5cac8
                  checksum/extra-secrets: e9582fdd622296c976cbc10a5ba7d6702c28a24fe80795ea5b84ba443a56c827
                  checksum/metadata-secret: b2fe937560e9635aeb01fce9100c2f836c5880f81c802565ce95fbcc8a56da4c
                  checksum/pgbouncer-config-secret: 1dae2adc757473469686d37449d076b0c82404f61413b58ae68b3c5e99527688
                  checksum/result-backend-secret: 98a68f230007cfa8f5d3792e1aff843a76b0686409e4a46ab2f092f6865a1b71
                  cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:           Running
IP:               100.65.129.143
IPs:
  IP:           100.65.129.143
Controlled By:  ReplicaSet/dna-airflow-scheduler-5cc8cfd8f6
Init Containers:
  wait-for-airflow-migrations:
    Container ID:  containerd://2bce7a37555105485909d706f8d5264156f87a598dc9e14a0272db55cb5f328c
    Image:         ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Image ID:      docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Port:          <none>
    Host Port:     <none>
    Args:
      airflow
      db
      check-migrations
      --migration-wait-timeout=60
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 12 Apr 2024 22:49:44 +0200
      Finished:     Fri, 12 Apr 2024 22:50:17 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      AIRFLOW__WEBSERVER__EXPOSE_CONFIG:    True
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'dna-airflow-fernet-key'>  Optional: false
      AIRFLOW_HOME:                         /opt/airflow
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'dna-airflow-webserver-secret-key'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
  git-sync-init:
    Container ID:   containerd://6fab71c316323da5dd38ab10af8c70ca0fd6590152f985fb3ed152987631dafd
    Image:          registry.k8s.io/git-sync/git-sync:v4.1.0
    Image ID:       registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 12 Apr 2024 22:50:18 +0200
      Finished:     Fri, 12 Apr 2024 22:50:21 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      GIT_SYNC_USERNAME:           <set to the key 'GIT_SYNC_USERNAME' in secret 'git-credentials'>  Optional: false
      GITSYNC_USERNAME:            <set to the key 'GITSYNC_USERNAME' in secret 'git-credentials'>   Optional: false
      GIT_SYNC_PASSWORD:           <set to the key 'GIT_SYNC_PASSWORD' in secret 'git-credentials'>  Optional: false
      GITSYNC_PASSWORD:            <set to the key 'GITSYNC_PASSWORD' in secret 'git-credentials'>   Optional: false
      GIT_SYNC_REV:                HEAD
      GITSYNC_REF:                 main
      GIT_SYNC_BRANCH:             main
      GIT_SYNC_REPO:               https://github.com/.../airflow-bizapps-dev.git
      GITSYNC_REPO:                https://github.com/.../airflow-bizapps-dev.git
      GIT_SYNC_DEPTH:              1
      GITSYNC_DEPTH:               1
      GIT_SYNC_ROOT:               /git
      GITSYNC_ROOT:                /git
      GIT_SYNC_DEST:               repo
      GITSYNC_LINK:                repo
      GIT_SYNC_ADD_USER:           true
      GITSYNC_ADD_USER:            true
      GITSYNC_PERIOD:              5s
      GIT_SYNC_MAX_SYNC_FAILURES:  0
      GITSYNC_MAX_FAILURES:        0
      GIT_SYNC_ONE_TIME:           true
      GITSYNC_ONE_TIME:            true
    Mounts:
      /git from dags (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
  fix-volume-logs-permissions:
    Container ID:  containerd://1cae61ad7f4eab50dd360f064b8645fd6c7fde7696e2346e2098d5d3e81c4879
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      chown -R 50000:0 /opt/airflow/logs/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 12 Apr 2024 22:50:23 +0200
      Finished:     Fri, 12 Apr 2024 22:50:23 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/airflow/logs/ from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
Containers:
  scheduler:
    Container ID:  containerd://43eff9161606c18596c87a4504de3e782367cf87ec63ecc0650f72db32d75032
    Image:         ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Image ID:      docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      -c
      exec airflow scheduler
    State:          Running
      Started:      Fri, 12 Apr 2024 23:32:46 +0200
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 12 Apr 2024 23:23:46 +0200
      Finished:     Fri, 12 Apr 2024 23:32:45 +0200
    Ready:          True
    Restart Count:  6
    Liveness:       exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --local
] delay=10s timeout=20s period=60s #success=1 #failure=5
    Startup:  exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --local
] delay=0s timeout=20s period=10s #success=1 #failure=6
    Environment:
      AIRFLOW__WEBSERVER__EXPOSE_CONFIG:    True
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'dna-airflow-fernet-key'>  Optional: false
      AIRFLOW_HOME:                         /opt/airflow
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-rds-db'>                              Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'dna-airflow-webserver-secret-key'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
      /opt/airflow/dags from dags (ro)
      /opt/airflow/logs from logs (rw)
      /opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
  git-sync:
    Container ID:   containerd://efae96265bca88f581c3e92c5f138f3ae7ca4db805aa0797016ccb375ad4b90e
    Image:          registry.k8s.io/git-sync/git-sync:v4.1.0
    Image ID:       registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 12 Apr 2024 22:50:23 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      GIT_SYNC_USERNAME:           <set to the key 'GIT_SYNC_USERNAME' in secret 'git-credentials'>  Optional: false
      GITSYNC_USERNAME:            <set to the key 'GITSYNC_USERNAME' in secret 'git-credentials'>   Optional: false
      GIT_SYNC_PASSWORD:           <set to the key 'GIT_SYNC_PASSWORD' in secret 'git-credentials'>  Optional: false
      GITSYNC_PASSWORD:            <set to the key 'GITSYNC_PASSWORD' in secret 'git-credentials'>   Optional: false
      GIT_SYNC_REV:                HEAD
      GITSYNC_REF:                 main
      GIT_SYNC_BRANCH:             main
      GIT_SYNC_REPO:               https://github.com/.../airflow-bizapps-dev.git
      GITSYNC_REPO:                https://github.com/.../airflow-bizapps-dev.git
      GIT_SYNC_DEPTH:              1
      GITSYNC_DEPTH:               1
      GIT_SYNC_ROOT:               /git
      GITSYNC_ROOT:                /git
      GIT_SYNC_DEST:               repo
      GITSYNC_LINK:                repo
      GIT_SYNC_ADD_USER:           true
      GITSYNC_ADD_USER:            true
      GITSYNC_PERIOD:              5s
      GIT_SYNC_MAX_SYNC_FAILURES:  0
      GITSYNC_MAX_FAILURES:        0
    Mounts:
      /git from dags (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
  scheduler-log-groomer:
    Container ID:  containerd://dc0b21c1c37f7aed6a7ff67e812ff95a125297ddae1975f2365511cf9b0a3cbc
    Image:         ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Image ID:      docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      /clean-logs
    State:          Running
      Started:      Fri, 12 Apr 2024 22:50:24 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      AIRFLOW__LOG_RETENTION_DAYS:  15
      AIRFLOW_HOME:                 /opt/airflow
    Mounts:
      /opt/airflow/logs from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dna-airflow-config
    Optional:  false
  dags:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  dna-airflow-logs
    ReadOnly:   false
  kube-api-access-t7tng:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  48m   default-scheduler  Successfully assigned airflow/dna-airflow-scheduler-5cc8cfd8f6-hx2bl to ip-172-18-231-89.us-west-2.compute.internal
  Normal   Pulling    48m   kubelet            Pulling image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235"
  Normal   Pulled     48m   kubelet            Successfully pulled image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" in 1.918s (1.918s including waiting)
  Normal   Created    48m   kubelet            Created container wait-for-airflow-migrations
  Normal   Started    48m   kubelet            Started container wait-for-airflow-migrations
  Normal   Pulled     48m   kubelet            Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
  Normal   Created    48m   kubelet            Created container git-sync-init
  Normal   Started    48m   kubelet            Started container git-sync-init
  Normal   Pulling    48m   kubelet            Pulling image "busybox"
  Normal   Pulled     48m   kubelet            Successfully pulled image "busybox" in 619ms (619ms including waiting)
  Normal   Created    48m   kubelet            Created container fix-volume-logs-permissions
  Normal   Pulled     48m   kubelet            Container image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" already present on machine
  Normal   Started    48m   kubelet            Started container fix-volume-logs-permissions
  Normal   Created    48m   kubelet            Created container scheduler
  Normal   Started    48m   kubelet            Started container scheduler
  Normal   Pulled     48m   kubelet            Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
  Normal   Created    48m   kubelet            Created container git-sync
  Normal   Started    48m   kubelet            Started container git-sync
  Normal   Created    48m   kubelet            Created container scheduler-log-groomer
  Normal   Started    48m   kubelet            Started container scheduler-log-groomer
  Warning  Unhealthy  47m   kubelet            Startup probe failed: command "sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \\\nairflow jobs check --job-type SchedulerJob --local\n" timed out
  Warning  Unhealthy  47m   kubelet            Startup probe failed: /home/airflow/.local/lib/python3.12/site-packages/airflow/metrics/statsd_logger.py:184 RemovedInAirflow3Warning: The basic metric validator will be deprecated in the future in favor of pattern-matching.  You can try this now by setting config option metrics_use_pattern_match to True.
No alive jobs found.
  Normal   Pulled     47m (x2 over 48m)   kubelet  Container image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" already present on machine
  Warning  Unhealthy  47m                 kubelet  Startup probe failed:
  Warning  Unhealthy  47m                 kubelet  Startup probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task cb4dafd9cf37d9ad90bd10a4d36ae47b7d3c3b714efd4a1022a971fce25ca6be not found: not found
  Warning  Unhealthy  49s (x26 over 45m)  kubelet  Liveness probe failed: /home/airflow/.local/lib/python3.12/site-packages/airflow/metrics/statsd_logger.py:184 RemovedInAirflow3Warning: The basic metric validator will be deprecated in the future in favor of pattern-matching.  You can try this now by setting config option metrics_use_pattern_match to True.
Unable to load the config, contains a configuration error.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/pathlib.py", line 1311, in mkdir
    os.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler/2024-04-12'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/logging/config.py", line 581, in configure
    handler = self.configure_handler(handlers[name])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/logging/config.py", line 848, in configure_handler
    result = factory(**kwargs)
             ^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/log/file_processor_handler.py", line 53, in __init__
    Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.12/pathlib.py", line 1320, in mkdir
    if not exist_ok or not self.is_dir():
                           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 875, in is_dir
    return S_ISDIR(self.stat().st_mode)
                   ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 840, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler/2024-04-12'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/__init__.py", line 61, in <module>
    settings.initialize()
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/settings.py", line 531, in initialize
    LOGGING_CLASS_PATH = configure_logging()
                         ^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/logging_config.py", line 74, in configure_logging
    raise e
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/logging_config.py", line 69, in configure_logging
    dictConfig(logging_config)
  File "/usr/local/lib/python3.12/logging/config.py", line 914, in dictConfig
    dictConfigClass(config).configure()
  File "/usr/local/lib/python3.12/logging/config.py", line 588, in configure
    raise ValueError('Unable to configure handler '
ValueError: Unable to configure handler 'processor'

How to reproduce

Create PVC for the logs in AWS EKS with the same parameters mentioned above. Deploy Airflow with Official helm chart on AWS EKS with PVC created before and use custom docker image mentioned above. Use external database as metadata store.

Anything else

approximately every 15 minutes

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

ravilk avatar Apr 15 '24 09:04 ravilk

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Apr 15 '24 09:04 boring-cyborg[bot]

Could you please check and dump the state of your shared volume with logs when it happens (including information abtout the symbolic link that is failing, permissions, ownership?

Seems that this is triggered by log rotation (so you can disable log rotation if you want to avoid the issue - but having information about the permissions of all the folders you have would be useful to track it down.

potiuk avatar Apr 18 '24 15:04 potiuk

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar May 03 '24 00:05 github-actions[bot]

This issue has been closed because it has not received response from the issue author.

github-actions[bot] avatar May 10 '24 00:05 github-actions[bot]