airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Investigate if we can replace `gunicornmontor` with `uvicorn.run()`

Open kaxil opened this issue 1 year ago • 6 comments

It is most likely that we no longer need gunicornmontor or UvicornMonitor anymore. @ashb 's suggestion is for Airflow uvicorn.run() should be enough.

Whoever takes this GitHub issue should verify the same and replace it if not needed.

The code:

  • https://github.com/apache/airflow/blob/f38d56dbf4dc1639142fc5a494d5da24996a56cc/airflow/cli/commands/fastapi_api_command.py#L159-L190
  • https://github.com/apache/airflow/blob/f38d56dbf4dc1639142fc5a494d5da24996a56cc/airflow/cli/commands/webserver_command.py#L49-L107

kaxil avatar Oct 15 '24 13:10 kaxil

The current GunicornMonitor provides the following capabilities:

  1. Automatic worker restarts if workers crash or hang:
    Ensures that if a worker crashes or becomes unresponsive, it is automatically restarted.

  2. Graceful worker scaling and reloads: This allows for addition and removal of workers and reloads workers gracefully when needed.

  3. Timeout management for unresponsive workers: Gunicorn monitors workers for unresponsiveness and can terminate them if they exceed a set timeout, preventing hangs.

If we switch to uvicorn.run(), we would lose these features since uvicorn.run() lacks built-in process management. Specifically:

If a worker dies, there's no master process to restart it. There will be no automatic scaling of workers, and no handling of worker timeouts or periodic restarts. To replicate this functionality, we would need an external process manager like systemd or supervisord, which adds additional complexity and overhead.

cc: @kaxil @ashb

vatsrahul1001 avatar Dec 17 '24 09:12 vatsrahul1001

For 2: https://docs.gunicorn.org/en/stable/signals.html

TTIN: Increment the number of processes by one TTOU: Decrement the number of processes by one

If a worker dies, there's no master process to restart it

Doesn't Gunicorn do that itself? https://docs.gunicorn.org/en/stable/design.html#master

The master process is a simple loop that listens for various process signals and reacts accordingly. It manages the list of running workers by listening for signals like TTIN, TTOU, and CHLD. TTIN and TTOU tell the master to increase or decrease the number of running workers. CHLD indicates that a child process has terminated, in this case the master process automatically restarts the failed worker.

So it's only the case of "worker hang" that might not be there anymore.Let me think

ashb avatar Dec 17 '24 12:12 ashb

For 1: https://docs.gunicorn.org/en/stable/settings.html#timeout I think?

ashb avatar Dec 17 '24 12:12 ashb

Just one comment here -> I've heard (but it's mostly through grapevine) that for quite a long time, uvicorn has the capability (and it's more and more recommended in production) - to manage multiple processes and handle sync requests directly - on their own and there is basically no need to use gunicorn at all.

Again it's more of "overheard" thing but looking at https://www.uvicorn.org/deployment/#using-a-process-manager , maybe that's what we are looking for? (or maybe I misunderstood what we want to do, just wanted to mention that gunicorn might not be needed at all maybe)

potiuk avatar Dec 17 '24 19:12 potiuk

To perform the comparison. I replaced Gunicorn code in else block with below uvicorn.run command

  uvicorn.run("airflow.api_fastapi.main:app", host=args.hostname, port=args.port, workers=num_workers,
                    timeout_keep_alive=worker_timeout, timeout_graceful_shutdown=worker_timeout, ssl_keyfile=ssl_key,
                    ssl_certfile=ssl_cert, access_log=access_logfile)

I used locust for performance testing with below configuration

These are the stats comparing uvicorn.run() with Gunicorn + GunicornMonitor


Comparison: Uvicorn vs. Gunicorn Performance

Request Statistics

Metric Uvicorn Gunicorn
Total Requests 14,714 14,726
Total Failures 0 13
Average Response Time 12.05 ms 13.46 ms
Min Response Time 7 ms 1 ms
Max Response Time 195 ms 216 ms
Average Size (bytes) 4,608 4,603.93
Requests Per Second (RPS) 49.05 49.09
Failures Per Second 0 0.04

Observations

  1. Response Times:

    • Uvicorn demonstrates slightly lower average and maximum response times compared to Gunicorn.
    • Percentile analysis shows Uvicorn's response times are more consistent, with fewer extreme values at higher percentiles.
  2. Failures:

    • Uvicorn had no failures, whereas Gunicorn recorded 13 failures caused by RemoteDisconnected errors. This could indicate potential issues in connection handling under load.
  3. Performance Consistency:

    • Uvicorn offers better consistency and reliability based on the above data.

vatsrahul1001 avatar Dec 19 '24 13:12 vatsrahul1001

Nice!.

potiuk avatar Dec 19 '24 13:12 potiuk