Custom XCom Backends Sample Code has a Leak
Affected Document: https://www.astronomer.io/guides/custom-xcom-backends
Severity:
- [ ] Critical: This document has incorrect or missing information. Add the
error/criticallabel to this issue. - [X] FYI: The document is misleading or could benefit from additional context or guidance. Add the
error/FYIlabel to this issue. - [ ] Typo: Add the
error/typolabel to this issue.
Suggested Change (include links to relevant existing docs, support tickets, messages, or external content):
I consulted this document for help with implementing custom xcom backends. Specifically, I wanted to have a custom backend available for testing https://github.com/apache/airflow/pull/17405. It didn't work out very well because I couldn't figure out which storage bucket items went with which task ID's.
One thing that's kind of strange about the sample code is that it creates a uuid every time a new value is written to the custom backend. So if you rerun a task 100 times (perhaps via the CLI invocation: airflow tasks run {dag_id} {task_id} {run_id}), you end up with 101 entries in your storage bucket--100 of which have no mapping back to an airflow DAG.
We should consider mentioning that in addition to serialize_value and deserialize_value you can also implement set and clear so that your storage bucket items have the associated dag_id, task_id, etc. in their names. This way, when airflow clears the reference (currently a UUID stored in traditional XCOM) it also clears the associated data in the storage bucket--rather than just leaving it lying around.
Additional Notes: In case more context is helpful:
- I got some help from a community member here: https://github.com/apache/airflow/pull/17405#issuecomment-929962066
- The place in my custom XCOM code where I associate the
dag_id, etc. with the to-be-serialized value is here: https://github.com/astronomer/qa-scenario-dags/blob/master/include/ycom.py#L126