airflow_db_cleanup task cleanup_sessions fails due to high RAM usage
In which file did you encounter the issue?
composer/workflows/airflow_db_cleanup.py
Line 466 to 491
Did you change the file? If so, how?
We ignore the select statements, which count how many sessions have been deleted (and some minor line break changes).
def cleanup_sessions():
session = settings.Session()
try:
logging.info("Deleting sessions...")
# before = len(session.execute(text("SELECT * FROM session WHERE expiry < now()::timestamp(0);")).mappings().all())
session.execute(text("DELETE FROM session WHERE expiry < now()::timestamp(0);"))
# after = len(session.execute(text("SELECT * FROM session WHERE expiry < now()::timestamp(0);")).mappings().all())
# logging.info("Deleted {} expired sessions.".format(before - after))
except Exception as e:
logging.error(e)
session.commit()
session.close()
Describe the issue
The task cleanup_sessions got killed for too much RAM usage. We see spikes in RAM usage every time this job ran and the sigkill supports this hypothesis.
[...]
[2023-12-15, 00:01:55 UTC] {cleanup.py:427} INFO - Deleting sessions...
[2023-12-15, 00:02:17 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code Negsignal.SIGKILL
This issue may not arise in bigger Cloud Composer configurations.
It would be more elegant and performant to not select everything and count in Python, but to count in the database e.g. by using:
SELECT COUNT(*) FROM session WHERE expiry < now()::timestamp(0); and only handling integers in Python.
Assigning to @michalmodras for further triage
I think that this exact change was already introduced: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11035
@leahecole I think we can safely close this issue, since this question was posted in a meantime and didn't reach me. While I was involved in reviewing that PR.
@apilaskowski You're right. I didn't see that PR 👍