airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[worker] Possibility to define custom resource requests for `discover` job

Open ivan-sukhomlyn opened this issue 1 year ago • 3 comments

Topic

worker config

Relevant information

Could you extend discover jobs configuration at the worker side in the same manner as for check jobs with the possibility to define custom resources instead of default ones?

The current behavior leads to overprovisioning of the Kubernetes cluster with such higher resource requests than usually needed for replication jobs

For example, the check job has this possibility - https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-workers/src/main/resources/application.yml#L151

But there's no for discover ones - https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-workers/src/main/resources/application.yml#L154

Proposal

  worker:
    kube-job-configs:
...
      check:
        annotations: ${CHECK_JOB_KUBE_ANNOTATIONS:}
        labels: ${CHECK_JOB_KUBE_LABELS:}
        node-selectors: ${CHECK_JOB_KUBE_NODE_SELECTORS:}
        cpu-limit: ${CHECK_JOB_MAIN_CONTAINER_CPU_LIMIT:}
        cpu-request: ${CHECK_JOB_MAIN_CONTAINER_CPU_REQUEST:}
        memory-limit: ${CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT:}
        memory-request: ${CHECK_JOB_MAIN_CONTAINER_MEMORY_REQUEST:}
      discover:
        annotations: ${DISCOVER_JOB_KUBE_ANNOTATIONS:}
        labels: ${DISCOVER_JOB_KUBE_LABELS:}
        node-selectors: ${DISCOVER_JOB_KUBE_NODE_SELECTORS:}
        cpu-limit: ${DISCOVER_JOB_MAIN_CONTAINER_CPU_LIMIT:}
        cpu-request: ${DISCOVER_JOB_MAIN_CONTAINER_CPU_REQUEST:}
        memory-limit: ${DISCOVER_JOB_MAIN_CONTAINER_MEMORY_LIMIT:}
        memory-request: ${DISCOVER_JOB_MAIN_CONTAINER_MEMORY_REQUEST:}

ivan-sukhomlyn avatar Jun 04 '24 15:06 ivan-sukhomlyn

Thanks for the request @ivan-sukhomlyn I included to the platform team backlog.

@davinchia now without the limit of reading large catalog maybe this is something necessary to make possible to now OOM during the discover schema.

marcosmarxm avatar Jun 05 '24 14:06 marcosmarxm

+1 We also need this enabled, else we need to set the resource.requests and resource.limits for all connectors (source/destination) by default to a too high value, for the initial setup of a connection.

mateocolina avatar Aug 22 '24 10:08 mateocolina

Definitely. This is something we are going to start looking at in the next quarter.

davinchia avatar Aug 22 '24 15:08 davinchia

Hi @davinchia any news on this?

mateocolina avatar Feb 07 '25 09:02 mateocolina

Fixed by discovery job resources definition via the workload-launcher env vars. 🎉

References:

  • https://github.com/airbytehq/airbyte-platform/blob/630df9c91f6a0e1feecf63546caad00aa77812ef/airbyte-workload-launcher/src/main/resources/application.yml#L195
  • https://github.com/airbytehq/airbyte/issues/48816#issuecomment-2589558630

ivan-sukhomlyn avatar Feb 14 '25 12:02 ivan-sukhomlyn