OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Review k8s Airflow setup spawning many workers

Open pmbrull opened this issue 7 months ago • 1 comments

I am running OpenMetadata version 1.5.6.0 using Helm deployment. I notice that as soon as I create a metadata job from UI, it creates 4 new gunicorn workers where their parent process is airflow-scheduler, which becomes a zombie process very soon. Even when the job is successfully run, the processes remain alive for a long time afterward that (not sure on this). Each worker process is taking around 500MB Now we are trying to execute multiple jobs sequentially, and I saw 40 pending gunicorn worker processes still alive, and my VM is running out of resources. Have i been doing anything wrong

From https://openmetadata.slack.com/archives/C02B6955S4S/p1744214845401779

pmbrull avatar Apr 10 '25 09:04 pmbrull

helm

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: openmetadata
spec:
  interval: 5m
  chart:
    spec:
      chart: openmetadata
      version: '1.5.6'
      sourceRef:
        kind: HelmRepository
        name: open-metadata
        namespace: openmetadata
  values:
    resources:
      requests:
        cpu: 250m
        memory: 3Gi
      limits:
        cpu: 1
        memory: 3Gi
    openmetadata:
      config:
        authorizer:
          className: "org.openmetadata.service.security.DefaultAuthorizer"
          containerRequestFilter: "org.openmetadata.service.security.JwtFilter"
        elasticsearch:
          searchType: opensearch
          port: 9200
          scheme: http
          connectionTimeoutSecs: 5
          socketTimeoutSecs: 60
          keepAliveTimeoutSecs: 600
          batchSize: 10
          auth:
            enabled: true
            password:
              secretRef: omd-es-secrets
              secretKey: password
        database:
          port: 5432
          driverClass: org.postgresql.Driver
          dbScheme: postgresql
          databaseName: openmetadata-db
          auth:
            username: openmetadata
            password:
              secretRef: omd-db-secrets
              secretKey: password
        pipelineServiceClientConfig:
          enabled: true
          # endpoint url for airflow
          apiEndpoint: http://openmetadata-dependencies-web.openmetadata.svc.cluster.local:8080
          auth:
            username: admin
            password:
                secretRef: omd-airflow-secrets
                secretKey: password
          metadataApiEndpoint: http://openmetadata.openmetadata.svc.cluster.local:8585/api
        authentication:
          enabled: false
        jwtTokenConfiguration:
          enabled: false
        eventMonitor:
          enabled: false
        smtpConfig:
          enableSmtpServer: false
        secretsManager:
          enabled: false
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: openmetadata-dependencies
spec:
  interval: 5m
  chart:
    spec:
      chart: openmetadata-dependencies
      version: '1.5.6'
      sourceRef:
        kind: HelmRepository
        name: open-metadata
        namespace: openmetadata
  values:
    mysql:
      enabled: false
    opensearch:
      enabled: false
    airflow:
      enabled: true
      airflow:
        image:
          repository: docker.getcollate.io/openmetadata/ingestion
          tag: 1.5.6.0
          pullPolicy: IfNotPresent
        executor: "KubernetesExecutor"
        config:
          # This is required for OpenMetadata UI to fetch status of DAGs
          AIRFLOW__API__AUTH_BACKENDS: "airflow.api.auth.backend.session,airflow.api.auth.backend.basic_auth"
          # OpenMetadata Airflow Apis Plugin DAGs Configuration
          AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/opt/airflow/dags"
        users:
        - username: admin
          password: ${ADMIN_PASSWORD}
          role: Admin
          email: [email protected]
          firstName: admin
          lastName: admin
        usersTemplates:
          ADMIN_PASSWORD:
            kind: secret
            name: omd-airflow-secrets
            key: password
        kubernetesPodTemplate:
          resources:
            requests:
              cpu: 500m
              memory: 2Gi
            limits:
              cpu: 4
              memory: 2Gi
        dbMigrations:
          resources:
            requests:
              cpu: 50m
              memory: 512Mi
            limits:
              cpu: 200m
              memory: 512Mi
        sync:
          resources:
            requests:
              cpu: 125m
              memory: 1Gi
            limits:
              cpu: 500m
              memory: 1Gi
      web:
        replicas: 1
        resources:
          requests:
            cpu: 250m
            memory: 6Gi
          limits:
            cpu: 1
            memory: 6Gi
      triggerer:
        resources:
          requests:
            cpu: 250m
            memory: 2Gi
          limits:
            cpu: 1
            memory: 2Gi
      pgbouncer:
        replicas: 1
        resources:
          requests:
            cpu: 25m
            memory: 256Mi
          limits:
            cpu: 200m
            memory: 256Mi
      scheduler:
        replicas: 1
        resources:
          requests:
            cpu: 1
            memory: 4Gi
          limits:
            cpu: 4
            memory: 4Gi
        logCleanup:
          enabled: false
      postgresql:
        enabled: false
      workers:
        enabled: false
      flower:
        enabled: false
      redis:
        enabled: false
      externalDatabase:
        type: postgres
        port: 5432
        database: openmetadata-airflow-db
        user: openmetadata-airflow
        passwordSecret: omd-airflow-secrets
        passwordSecretKey: postgresql-password
      serviceAccount:
        create: true
        name: "airflow"
      dags:
        persistence:
          enabled: true
          # NOTE: "" means cluster-default
          storageClass: ""
          size: 1Gi
          accessMode: ReadWriteMany
      logs:
        persistence:
          enabled: true
          # empty string means cluster-default
          storageClass: ""
          accessMode: ReadWriteMany
          size: 1Gi

pmbrull avatar Apr 10 '25 09:04 pmbrull

Hey @pmbrull ! Did you find a solution to this problem? We've been having the same problem. We are using OpenMetadata version 1.9.7 but this still happens.

blagazsolt avatar Sep 12 '25 10:09 blagazsolt

I'm facing similar issue in our 1.8.7 k8s deployment when multiple ingestion pipelines are deployed/triggered - Airflow webserver hits memory limit and restarts. @pmbrull. Did you find a solution for this?

AmythD avatar Oct 01 '25 06:10 AmythD

@AmythD we could not get our hands on this yet. Currently discussing this with a contributor to see if we can move it. Thank you

pmbrull avatar Oct 02 '25 15:10 pmbrull