airflow icon indicating copy to clipboard operation
airflow copied to clipboard

ProcessPoolExecutor in CeleryExecutor should be reused

Open luoyuliuyin opened this issue 1 year ago • 0 comments

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When the scheduler send task to celery, if there is only 1 task in the current cycle, the task will be sent to the main thread; if there are multiple tasks, a thread pool will be created based on the number of CPU cores, and then all tasks will be consumed by the thread pool. There are some problems with the current implementation. The scheduler creates a thread pool every time it schedules, which will bring a very large performance overhead. In fact, the thread pool can be reused. image

When I tested, sometimes it would take almost 4 seconds to consume 32 tasks. image

What you think should happen instead?

Reuse ProcessPoolExecutor in CeleryExecutor

How to reproduce

always

Operating System

macOS

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

luoyuliuyin avatar May 08 '24 11:05 luoyuliuyin