[worker] Possibility to define custom resource requests for `discover` job
Topic
worker config
Relevant information
Could you extend discover jobs configuration at the worker side in the same manner as for check jobs with the possibility to define custom resources instead of default ones?
The current behavior leads to overprovisioning of the Kubernetes cluster with such higher resource requests than usually needed for replication jobs
For example, the check job has this possibility - https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-workers/src/main/resources/application.yml#L151
But there's no for discover ones - https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-workers/src/main/resources/application.yml#L154
Proposal
worker:
kube-job-configs:
...
check:
annotations: ${CHECK_JOB_KUBE_ANNOTATIONS:}
labels: ${CHECK_JOB_KUBE_LABELS:}
node-selectors: ${CHECK_JOB_KUBE_NODE_SELECTORS:}
cpu-limit: ${CHECK_JOB_MAIN_CONTAINER_CPU_LIMIT:}
cpu-request: ${CHECK_JOB_MAIN_CONTAINER_CPU_REQUEST:}
memory-limit: ${CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT:}
memory-request: ${CHECK_JOB_MAIN_CONTAINER_MEMORY_REQUEST:}
discover:
annotations: ${DISCOVER_JOB_KUBE_ANNOTATIONS:}
labels: ${DISCOVER_JOB_KUBE_LABELS:}
node-selectors: ${DISCOVER_JOB_KUBE_NODE_SELECTORS:}
cpu-limit: ${DISCOVER_JOB_MAIN_CONTAINER_CPU_LIMIT:}
cpu-request: ${DISCOVER_JOB_MAIN_CONTAINER_CPU_REQUEST:}
memory-limit: ${DISCOVER_JOB_MAIN_CONTAINER_MEMORY_LIMIT:}
memory-request: ${DISCOVER_JOB_MAIN_CONTAINER_MEMORY_REQUEST:}
Thanks for the request @ivan-sukhomlyn I included to the platform team backlog.
@davinchia now without the limit of reading large catalog maybe this is something necessary to make possible to now OOM during the discover schema.
+1 We also need this enabled, else we need to set the resource.requests and resource.limits for all connectors (source/destination) by default to a too high value, for the initial setup of a connection.
Definitely. This is something we are going to start looking at in the next quarter.
Hi @davinchia any news on this?
Fixed by discovery job resources definition via the workload-launcher env vars. 🎉
References:
- https://github.com/airbytehq/airbyte-platform/blob/630df9c91f6a0e1feecf63546caad00aa77812ef/airbyte-workload-launcher/src/main/resources/application.yml#L195
- https://github.com/airbytehq/airbyte/issues/48816#issuecomment-2589558630