airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Chart: Enhance Celery Worker Sets support for multi-queue configurations

Open glennhsh opened this issue 1 month ago • 3 comments

Description

This PR enhances the Airflow Helm chart to support advanced Celery worker topologies, enabling more flexible resource allocation and precise autoscaling configurations.

Why is this needed?

1. Flexible Worker Topologies As Airflow adoption grows, platform teams often need to route tasks exclusively to specialized worker sets (e.g., GPU-optimized, Memory-optimized) without maintaining a generic "default" worker.

  • Enhancement: The new workers.enableDefault flag allows users to configure a deployment consisting only of specialized worker sets defined in workers.sets. This provides greater flexibility for teams to design their worker architecture exactly as needed.

2. Multi-Queue Autoscaling Support Complex workflows often require a single worker set to handle tasks from multiple specific queues (e.g., queue: "high-priority,vip").

  • Enhancement: This PR updates the KEDA ScaledObject generation to support comma-separated queue lists. By using the SQL IN (...) clause, we ensure that KEDA scales worker sets based on the precise aggregate workload of all their assigned queues.

3. Granular Configuration Overrides Different worker sets may require different operational strategies within the same cluster.

  • Enhancement: This change improves the configuration merge logic, allowing individual worker sets to override global settings. For example, a user can now enable KEDA globally but explicitly disable it for a specific worker set that requires a static number of replicas.

Changes

  • New Feature: Added workers.enableDefault (default: true) to values.yaml.
  • Enhancement: Updated worker-kedaautoscaler.yaml to use SQL IN clause for queue filtering, supporting multi-queue configurations (e.g., queue: "a,b" -> AND queue IN ('a','b')).
  • Refactor: Standardized template rendering to ensure consistent behavior between the default worker and workers.sets.

Testing

  • Added test cases in helm-tests/tests/helm_tests/other/test_keda.py to verify:
    • Correct SQL generation for single queues.
    • Correct SQL generation for comma-separated queue lists using the IN clause.
    • Proper handling of whitespace in queue configurations.
  • Verified that workers.enableDefault correctly controls the rendering of the default worker deployment.

closes: #56591 #34219


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

glennhsh avatar Nov 21 '25 07:11 glennhsh

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst) Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack

boring-cyborg[bot] avatar Nov 21 '25 07:11 boring-cyborg[bot]

I haven't looked at either PR, but #56589 has been open for a month with this same feature.

jedcunningham avatar Nov 24 '25 03:11 jedcunningham

Was also puzzled that I overlooked that there are two PRs for the same. I actually reviewed both and now after some days coming back and realized the overlap attempted to compare.

I like THIS PR a bit more compared to #56589 because (1) is is leaner and better to read as diff and (2) also extends KEDA and HPA in the PR which is explicitly excluded in the other. Even though the other was there previously I'd propose to merge this one.

@jedcunningham Can you check and compare and make a second pair of eyes? I would propose to merge this one.

jscheffl avatar Dec 07 '25 22:12 jscheffl

@jedcunningham ping?

jscheffl avatar Dec 14 '25 14:12 jscheffl