OpenMetadata
OpenMetadata copied to clipboard
Review k8s Airflow setup spawning many workers
I am running OpenMetadata version 1.5.6.0 using Helm deployment. I notice that as soon as I create a metadata job from UI, it creates 4 new gunicorn workers where their parent process is airflow-scheduler, which becomes a zombie process very soon. Even when the job is successfully run, the processes remain alive for a long time afterward that (not sure on this). Each worker process is taking around 500MB Now we are trying to execute multiple jobs sequentially, and I saw 40 pending gunicorn worker processes still alive, and my VM is running out of resources. Have i been doing anything wrong
From https://openmetadata.slack.com/archives/C02B6955S4S/p1744214845401779
helm
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: openmetadata
spec:
interval: 5m
chart:
spec:
chart: openmetadata
version: '1.5.6'
sourceRef:
kind: HelmRepository
name: open-metadata
namespace: openmetadata
values:
resources:
requests:
cpu: 250m
memory: 3Gi
limits:
cpu: 1
memory: 3Gi
openmetadata:
config:
authorizer:
className: "org.openmetadata.service.security.DefaultAuthorizer"
containerRequestFilter: "org.openmetadata.service.security.JwtFilter"
elasticsearch:
searchType: opensearch
port: 9200
scheme: http
connectionTimeoutSecs: 5
socketTimeoutSecs: 60
keepAliveTimeoutSecs: 600
batchSize: 10
auth:
enabled: true
password:
secretRef: omd-es-secrets
secretKey: password
database:
port: 5432
driverClass: org.postgresql.Driver
dbScheme: postgresql
databaseName: openmetadata-db
auth:
username: openmetadata
password:
secretRef: omd-db-secrets
secretKey: password
pipelineServiceClientConfig:
enabled: true
# endpoint url for airflow
apiEndpoint: http://openmetadata-dependencies-web.openmetadata.svc.cluster.local:8080
auth:
username: admin
password:
secretRef: omd-airflow-secrets
secretKey: password
metadataApiEndpoint: http://openmetadata.openmetadata.svc.cluster.local:8585/api
authentication:
enabled: false
jwtTokenConfiguration:
enabled: false
eventMonitor:
enabled: false
smtpConfig:
enableSmtpServer: false
secretsManager:
enabled: false
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: openmetadata-dependencies
spec:
interval: 5m
chart:
spec:
chart: openmetadata-dependencies
version: '1.5.6'
sourceRef:
kind: HelmRepository
name: open-metadata
namespace: openmetadata
values:
mysql:
enabled: false
opensearch:
enabled: false
airflow:
enabled: true
airflow:
image:
repository: docker.getcollate.io/openmetadata/ingestion
tag: 1.5.6.0
pullPolicy: IfNotPresent
executor: "KubernetesExecutor"
config:
# This is required for OpenMetadata UI to fetch status of DAGs
AIRFLOW__API__AUTH_BACKENDS: "airflow.api.auth.backend.session,airflow.api.auth.backend.basic_auth"
# OpenMetadata Airflow Apis Plugin DAGs Configuration
AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/opt/airflow/dags"
users:
- username: admin
password: ${ADMIN_PASSWORD}
role: Admin
email: [email protected]
firstName: admin
lastName: admin
usersTemplates:
ADMIN_PASSWORD:
kind: secret
name: omd-airflow-secrets
key: password
kubernetesPodTemplate:
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 4
memory: 2Gi
dbMigrations:
resources:
requests:
cpu: 50m
memory: 512Mi
limits:
cpu: 200m
memory: 512Mi
sync:
resources:
requests:
cpu: 125m
memory: 1Gi
limits:
cpu: 500m
memory: 1Gi
web:
replicas: 1
resources:
requests:
cpu: 250m
memory: 6Gi
limits:
cpu: 1
memory: 6Gi
triggerer:
resources:
requests:
cpu: 250m
memory: 2Gi
limits:
cpu: 1
memory: 2Gi
pgbouncer:
replicas: 1
resources:
requests:
cpu: 25m
memory: 256Mi
limits:
cpu: 200m
memory: 256Mi
scheduler:
replicas: 1
resources:
requests:
cpu: 1
memory: 4Gi
limits:
cpu: 4
memory: 4Gi
logCleanup:
enabled: false
postgresql:
enabled: false
workers:
enabled: false
flower:
enabled: false
redis:
enabled: false
externalDatabase:
type: postgres
port: 5432
database: openmetadata-airflow-db
user: openmetadata-airflow
passwordSecret: omd-airflow-secrets
passwordSecretKey: postgresql-password
serviceAccount:
create: true
name: "airflow"
dags:
persistence:
enabled: true
# NOTE: "" means cluster-default
storageClass: ""
size: 1Gi
accessMode: ReadWriteMany
logs:
persistence:
enabled: true
# empty string means cluster-default
storageClass: ""
accessMode: ReadWriteMany
size: 1Gi
Hey @pmbrull ! Did you find a solution to this problem? We've been having the same problem. We are using OpenMetadata version 1.9.7 but this still happens.
I'm facing similar issue in our 1.8.7 k8s deployment when multiple ingestion pipelines are deployed/triggered - Airflow webserver hits memory limit and restarts. @pmbrull. Did you find a solution for this?
@AmythD we could not get our hands on this yet. Currently discussing this with a contributor to see if we can move it. Thank you