python-docs-samples icon indicating copy to clipboard operation
python-docs-samples copied to clipboard

airflow_db_cleanup task cleanup_sessions fails due to high RAM usage

Open alexhaas-otto opened this issue 1 year ago • 1 comments

In which file did you encounter the issue?

composer/workflows/airflow_db_cleanup.py

Line 466 to 491

Did you change the file? If so, how?

We ignore the select statements, which count how many sessions have been deleted (and some minor line break changes).

def cleanup_sessions():
    session = settings.Session()

    try:
        logging.info("Deleting sessions...")
        # before = len(session.execute(text("SELECT * FROM session WHERE expiry < now()::timestamp(0);")).mappings().all())
        session.execute(text("DELETE FROM session WHERE expiry < now()::timestamp(0);"))
        # after = len(session.execute(text("SELECT * FROM session WHERE expiry < now()::timestamp(0);")).mappings().all())
        # logging.info("Deleted {} expired sessions.".format(before - after))
    except Exception as e:
        logging.error(e)

    session.commit()
    session.close()

Describe the issue

The task cleanup_sessions got killed for too much RAM usage. We see spikes in RAM usage every time this job ran and the sigkill supports this hypothesis.

[...]
[2023-12-15, 00:01:55 UTC] {cleanup.py:427} INFO - Deleting sessions...
[2023-12-15, 00:02:17 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code Negsignal.SIGKILL

This issue may not arise in bigger Cloud Composer configurations.

It would be more elegant and performant to not select everything and count in Python, but to count in the database e.g. by using: SELECT COUNT(*) FROM session WHERE expiry < now()::timestamp(0); and only handling integers in Python.

alexhaas-otto avatar Jan 02 '24 11:01 alexhaas-otto

Assigning to @michalmodras for further triage

leahecole avatar Mar 19 '24 19:03 leahecole

I think that this exact change was already introduced: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11035

apilaskowski avatar Apr 03 '24 14:04 apilaskowski

@leahecole I think we can safely close this issue, since this question was posted in a meantime and didn't reach me. While I was involved in reviewing that PR.

apilaskowski avatar Apr 03 '24 14:04 apilaskowski

@apilaskowski You're right. I didn't see that PR 👍

alexhaas-otto avatar Apr 04 '24 07:04 alexhaas-otto