airflow icon indicating copy to clipboard operation
airflow copied to clipboard

fix: sqa deprecations for airflow tests

Open dondaum opened this issue 1 year ago • 5 comments
trafficstars

related: #28723

fix deprecations for SQLAlchemy 2.0 for Airflow tests.

@Taragolis I went through all test warnings. IMO for the majority of SQA 2.0 warnings there isn't something to do. That is often the case with the warnngs 'object is being merged into a Session along the backref cascade... ' SQA is warning the a object is pulled into the session with a backref cascade and not explicit. But often we compensate for that by adding the object a few lines later in the session which should be fine. In such cases we still get a warning but there won't be an issue later. Here it is also explained.

All tests that raise warnings but should be fine: tests/providers/fab/auth_manager/api_endpoints/test_user_schema.py:63 tests/jobs/test_backfill_job.py:1912 tests/models/test_dagrun.py:2090 tests/models/test_taskinstance.py:1487 tests/ti_deps/deps/test_trigger_rule_dep.py:140 tests/api_connexion/endpoints/test_task_instance_endpoint.py:190

Fixed but still raise a warning tests/serialization/test_pydantic_models.py:162 tests/serialization/test_pydantic_models.py:163

Fixed reported in tests


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

dondaum avatar Apr 28 '24 21:04 dondaum

Fixed but still raise a warning tests/serialization/test_pydantic_models.py:162 tests/serialization/test_pydantic_models.py:163

Could you provide what kind of error you tried to fix?

Taragolis avatar Apr 30 '24 15:04 Taragolis

Fixed but still raise a warning tests/serialization/test_pydantic_models.py:162 tests/serialization/test_pydantic_models.py:163

Could you provide what kind of error you tried to fix?

Both raise the following warnings

{"category": "sqlalchemy.exc.RemovedIn20Warning", "message": "\"DagScheduleDatasetReference\" object is being merged into a Session along the backref cascade path for relationship \"DatasetModel.consuming_dags\"; in SQLAlchemy 2.0, this reverse cascade will not take place.  Set cascade_backrefs to False in either the relationship() or backref() function for the 2.0 behavior; or to set globally for the whole Session, set the future=True flag (Background on this error at: https://sqlalche.me/e/14/s9r1) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", "filename": "tests/serialization/test_pydantic_models.py", "lineno": 162, "when": "runtest", "node_id": "tests/serialization/test_pydantic_models.py::test_serializing_pydantic_dataset_event", "group": "tests", "count": 1}
{"category": "sqlalchemy.exc.RemovedIn20Warning", "message": "\"TaskOutletDatasetReference\" object is being merged into a Session along the backref cascade path for relationship \"DatasetModel.producing_tasks

We can fix it by first intializing the object, then adding it into the Session and then setting the dataset which is where the backref is happening.

@pytest.mark.skipif(not _ENABLE_AIP_44, reason="AIP-44 is disabled")
def test_serializing_pydantic_dataset_event(session, create_task_instance, create_dummy_dag):
    ...
    dag_ds_ref = DagScheduleDatasetReference(dag_id=dag.dag_id)
    session.add(dag_ds_ref)
    dag_ds_ref.dataset = ds1
    task_ds_ref = TaskOutletDatasetReference(task_id=task1.task_id, dag_id=dag.dag_id)
    session.add(task_ds_ref)
    task_ds_ref.dataset = ds1

Yet IMO the code would looks less readable than intializing the whole object first and add it into the Session.

@pytest.mark.skipif(not _ENABLE_AIP_44, reason="AIP-44 is disabled")
def test_serializing_pydantic_dataset_event(session, create_task_instance, create_dummy_dag):
    ...
    dag_ds_ref = DagScheduleDatasetReference(dag_id=dag.dag_id, dataset=ds1)
    task_ds_ref = TaskOutletDatasetReference(task_id=task1.task_id, dag_id=dag.dag_id, dataset=ds1)
    session.add_all([dag_ds_ref, task_ds_ref])

dondaum avatar May 01 '24 21:05 dondaum

Set cascade_backrefs to False in either the relationship() or backref() function for the 2.0 behavior

I guess it points to change in models relationships, maybe it is simmilar: https://github.com/sqlalchemy/sqlalchemy/issues/6148

Taragolis avatar May 02 '24 07:05 Taragolis

Oh interesting. Due to your comment I looked again in the SQL Alchemy docs

I thought we have enabled already SQA 2.0 style by setting future=True in the engine here but it seems that this enables 2.0 style of engines and connections only. Yet for the ORM, to anble future behavior in the ORM Sessions we also need to pass future=True in the sessionmaker object.

    Session = scoped_session(
        sessionmaker(
            autocommit=False,
            autoflush=False,
            bind=engine,
            expire_on_commit=False,
            future=True
        )
    )

If I do the warnings disapear. If I remove my fixes the tests fail. So it seems that is pretty usefull to test if the changes / fixes work and it might makes in general sense to set future=True also for the ORM session ?

dondaum avatar May 02 '24 23:05 dondaum

There is one small things, according to the docs it is recommended switch to Future=True after we resolve all RemovedIn20 warnings. https://docs.sqlalchemy.org/en/20/changelog/migration_20.html#migration-to-2-0-step-five-use-the-future-flag-on-session

So maybe we need to start replace legacy backrefs by back_populates: https://docs.sqlalchemy.org/en/14/orm/relationships.html because this warning related only to backref

Taragolis avatar May 03 '24 01:05 Taragolis

Yes legacy backrefs should be replaced anyway by back_populates and explicit relationships.

Yet we will still see those warnings as also with back_populates and an explicit relationship SQLAlchlemy 1.4 still uses the old way of merging an object into the Session along the cascade path.

Just to make sure, we can fix all object is being merged into a Session along the backref cascade path with the current setup (NOT setting future=True in the Session object) when we

  • create the object first,
  • adding it then to the session and
  • adding afterwards the relationship values where backref/back_populates is happening.

Similar to here. IMO it will still be less readable and unstandable. It is also just a warning that does not lead to false behavior when handled correctly. And you can confirm it by using future=True in the Session

WDYT? Shall I change it in a way that all warnings are fixed or shall we acknowledge some warnings as False positives ?

dondaum avatar May 03 '24 07:05 dondaum

I would rather fix one by one rather than apply fix which might hide this

We still have third party components, which strictly depends on SA14 and might not work correctly with new sessions

Taragolis avatar May 03 '24 09:05 Taragolis

Particular this one described in https://docs.sqlalchemy.org/en/14/orm/cascades.html#save-update And there is description about cascade_backrefs which is True by default in 1.4, and False in 2.0

Taragolis avatar May 03 '24 09:05 Taragolis

Ok. Changed. All RemovedIn20Warning should be fixed.

dondaum avatar May 03 '24 20:05 dondaum