airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Unhandled Exception in remote logging if connection doesn't exist

Open csp33 opened this issue 4 months ago • 5 comments

Apache Airflow version

3.0.4

If "Other Airflow 2 version" selected, which one?

No response

What happened?

After upgrading our Airflow deployment from version 3.0.3 to 3.0.4, all tasks started to fail with a RuntimeError: generator didn't yield. Our Airflow instance is deployed on Kubernetes and is configured to use S3 for remote logging.

The error occurred because a missing aws_default connection caused a failure during the logging setup phase for tasks. This seems to be a new requirement in version 3.0.4 that wasn't present in 3.0.3, or a regression that causes a critical failure when the connection is absent.

Worker logs traceback:

2025-08-14 09:02:05.978736 [warning  ] Server error              [airflow.sdk.api.client] detail={'detail': {'reason': 'not_found', 'message': 'Connection with ID aws_default not found'}}
2025-08-14 09:02:05.979060 [error    ] Connection not found          [airflow.sdk.api.client] conn_id=aws_default detail={'detail': {'reason': 'not_found', 'message': 'Connection with ID aws_default not found'}} status_code=404
2025-08-14 09:02:05.985273 [error    ] Task execute_workload[fe7b5051-82c5-4dec-880d-e4e7c37b3ef2] raised unexpected: RuntimeError("generator didn't yield") [celery.app.trace]

The traceback points to this line in the code: https://github.com/apache/airflow/blob/3.0.4/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L1700

│ /app/.venv/lib/python3.11/site-packages/airflow/sdk/execution_time/supervisor.py:1700 in      │
│ _configure_logging                                                                           │
│                                                                                              │
│ ❱ 1700    with _remote_logging_conn(client):                                                 │
...
RuntimeError: generator didn't yield

What you think should happen instead?

Tasks should execute successfully. If a connection is required for remote logging, Airflow should either:

  • Use a different, non-critical logging path if the connection is missing and warn users.

  • Provide a clear, helpful error message about the missing connection instead of a cryptic RuntimeError: generator didn't yield.

  • Include a warning in the release notes about the new requirement for a aws_default connection when using S3 remote logging.

How to reproduce

  • Deploy an Airflow instance on Kubernetes.

  • Configure remote_logging to use S3 in airflow.cfg or through environment variables.

  • Ensure that there is no Airflow connection named aws_default.

  • Upgrade the Airflow version to 3.0.4.

  • Run a DAG with one or more tasks.

  • Observe the task failure and the logs.

Operating System

Airflow in k8s

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

csp33 avatar Aug 14 '25 09:08 csp33

Probably related: https://github.com/apache/airflow/pull/53719

csp33 avatar Aug 14 '25 09:08 csp33

cc @ashb

eladkal avatar Aug 14 '25 11:08 eladkal

Yeah the error should be improved, but I think an exception is the right behaviour here?

ashb avatar Aug 14 '25 16:08 ashb

We could continue to log things to a local file, and in that file log that the remote log couldn't be uploaded -- I can see that perhaps being more useful than the current behaviour too

ashb avatar Aug 14 '25 17:08 ashb

I met the same error at 3.1.3

xuannguyenhehe avatar Dec 10 '25 16:12 xuannguyenhehe

The overall problem is that the api server is called and the connection is requested via API call. This endpoint only checks if there is one in the DB. If not it returns a 404. So all the env var specified connections are ignored here. I would argue that these even should have priority. Why not check the env var created connections before calling the API server anyways? @ashb

Edit: I do understand why the API server does not return these connections as they have never been shown in the UI. Still for internal usage env var specified connections seem to be less useful.

Edit2: TLDR; Add the connection via UI or deployment hook to the actual airflow DB connections.

dada-engineer avatar Dec 11 '25 07:12 dada-engineer