warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Cannot test database-created objects in the context of a webtest

Open miketheman opened this issue 1 year ago • 7 comments

Here's an example test case:

from tests.common.db.accounts import UserFactory

def test_user_profile(db_session, webtest):
    user = UserFactory.create()
    assert user.username
    # vist user's page
    resp = webtest.get(f"/user/{user.username}/")
    assert resp.status_code == 200

This fails with a 404 Not Found. I want to get a 200.

I tried following the guidance here https://docs.pylonsproject.org/projects/pyramid_tm/en/latest/#testing but I'm overlooking something with how we create sessions/transactions, and can't seem to figure out how to share the session from db_session into the webtest test app, since pyramid_tm will start a fresh transaction for the web request.

There might be something to restructure between our database, app_config fixtures to allow this - I'm a little lost, so I thought I'd write it down for anyone else to poke at.

miketheman avatar Mar 19 '24 21:03 miketheman

Oh, and the motivation behind this is that this kind of setup would allow us to write more functional (integration) tests that create a database state, and then perform actions via the web/api interface (a la browsing, logging in, uploading things, etc) with fewer mocks.

miketheman avatar Mar 19 '24 21:03 miketheman

Hi @miketheman - I opened a draft PR that doesn't fix anything, but maybe helps others reason with the code a bit.

I've at least confirmed that the db contains an object :-) In my projects I sometimes do db-level queries to assert table state. I also do cleanup prior to running tests, usually doing a simple and stupid for t in tables: delete_all_data_in(t). It's heavy-handed but it works.

The db_session fixture creates a session, webtest creates a tm, and pyramid creates a tm ... lots of potential sessions/tms.

jzohrab avatar Mar 31 '24 02:03 jzohrab

Thanks @jzohrab ! I haven't had a chance to pursue that branch, but maybe @di is interested, as he's tinkered with this problem as well.

miketheman avatar Apr 03 '24 17:04 miketheman

I took another look today, and am still scratching my head. Thanks for the draft branch, @jzohrab sadly it didn't make it much clearer.

There's a bunch of pieces involved, and trying to tease them apart with what I know so far. We use (from the bottom up?) these libraries:

  • psycopg
  • sqlalchemy
  • zope.sqlalchemy
  • pyramid_tm

Then we layer on our own logic in config.py and db.py.

THEN we create fixtures to take advantage of behaviors in tests:

  • database (test session-wide) - inits the database and returns a connection string
  • app_config (test session-wide) - uses database, creates a pyramid config with the conn string, and runs all db migrations
  • db_session (initiated per test) - uses app_config, gets the engine from config, connects, starts a transaction, creates a session, starts a nested transaction, does some SAVEPOINT magic, and yields the session. When the test ends, runs finally block and closes, tears down, etc.
  • webtest (initiated per test) - uses app_config, creates a customized version of webtest.TestApp to support xmlrpc calls and disable https requirement

So something in all of these layers is not joining the active transaction/session correctly, or potentially the step that inserts data into the database isn't closing, or sqlalchemy's implicit begin is causing the previous session data to not be visible yet? More effort needed here.

miketheman avatar Apr 12 '24 19:04 miketheman

Was reading some other SQLAlchemy docs today, and saw this, might have something to do with the issue: https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#joining-a-session-into-an-external-transaction-such-as-for-test-suites

miketheman avatar Apr 19 '24 20:04 miketheman

This is pretty much already what we're doing here: https://github.com/pypi/warehouse/blob/97d8de8a220218c11138ecdf44267862b4d6f65f/tests/conftest.py#L343-L366

I added join_transaction_mode="create_savepoint" but your failing test still didn't pass.

I also tried removing the session nesting that we have, in addition to that, but that didn't change anything either.

di avatar Apr 19 '24 20:04 di

The good news is I think we can use this to simplify the fixture a little bit: https://github.com/pypi/warehouse/pull/15826

di avatar Apr 19 '24 20:04 di

I've proposed solving this by reusing the same db session, rather than always creating a new db session in warehouse.db.

Approach was like this:

  1. Ran a logged container of Postgres (add command: ["postgres", "-c", "log_statement=all"] to the db service in docker-compose.yml) + docker compose logs db -f
  2. Noticed 2 different PIDs (unverified if that also means 2 different sessions, but log output is below)
  3. Had also wondered about the tutorials reusing the same db session: https://docs.pylonsproject.org/projects/pyramid/en/latest/tutorials/wiki2/tests.html
  4. Added a way to override creating a db session in warehouse.db and instead feed the db_session fixture: https://github.com/pypi/warehouse/pull/16031

In the output, I note that there are two different PIDs (3727 and 3791) mentioned. I haven't found any definitive documentation about whether the PID indicates a new connection or session, but it was enough to start playing with a hypothesis that the WSGI application had created a new db session/connection.

View postgres log

db-1  | 2024-05-31 09:17:15.642 UTC [3727] LOG:  execute <unnamed>: BEGIN ISOLATION LEVEL READ COMMITTED
db-1  | 2024-05-31 09:17:15.643 UTC [3727] LOG:  statement: SAVEPOINT sa_savepoint_1
db-1  | 2024-05-31 09:17:15.648 UTC [3727] LOG:  execute <unnamed>: INSERT INTO users (username, name, password, is_active, is_superuser, is_moderator, is_psf_staff, date_joined, last_login, disabled_for, totp_secret, last_totp_value, observer_association_id) VALUES ($1, $2::VARCHAR, $3::VARCHAR, $4, $5, $6, $7, $8::TIMESTAMP WITHOUT TIME ZONE, $9::TIMESTAMP WITHOUT TIME ZONE, $10, $11, $12::VARCHAR, $13::UUID) RETURNING users.password_date, users.is_frozen, users.is_observer, users.prohibit_password_reset, users.hide_avatar, users.sitemap_bucket, users.id
db-1  | 2024-05-31 09:17:15.648 UTC [3727] DETAIL:  parameters: $1 = 'kvtYJaeoytrC', $2 = 'movement', $3 = '!', $4 = 't', $5 = 'f', $6 = 'f', $7 = 'f', $8 = '2005-11-02 21:49:24.866345', $9 = '2023-09-30 08:53:29.62694', $10 = NULL, $11 = '\x02c807f5e8f6b992119ec8b127720796adf72276', $12 = NULL, $13 = NULL
db-1  | 2024-05-31 09:17:15.661 UTC [3727] LOG:  execute <unnamed>: SELECT users.username AS users_username, users.name AS users_name, users.password AS users_password, users.password_date AS users_password_date, users.is_active AS users_is_active, users.is_frozen AS users_is_frozen, users.is_superuser AS users_is_superuser, users.is_moderator AS users_is_moderator, users.is_psf_staff AS users_is_psf_staff, users.is_observer AS users_is_observer, users.prohibit_password_reset AS users_prohibit_password_reset, users.hide_avatar AS users_hide_avatar, users.date_joined AS users_date_joined, users.last_login AS users_last_login, users.disabled_for AS users_disabled_for, users.totp_secret AS users_totp_secret, users.last_totp_value AS users_last_totp_value, users.sitemap_bucket AS users_sitemap_bucket, users.observer_association_id AS users_observer_association_id, users.id AS users_id, user_emails_1.id AS user_emails_1_id, user_emails_1.user_id AS user_emails_1_user_id, user_emails_1.email AS user_emails_1_email, user_emails_1."primary" AS user_emails_1_primary, user_emails_1.verified AS user_emails_1_verified, user_emails_1.public AS user_emails_1_public, user_emails_1.unverify_reason AS user_emails_1_unverify_reason, user_emails_1.transient_bounces AS user_emails_1_transient_bounces 
db-1  | 	FROM users LEFT OUTER JOIN user_emails AS user_emails_1 ON users.id = user_emails_1.user_id 
db-1  | 	WHERE users.id = $1::UUID
db-1  | 2024-05-31 09:17:15.661 UTC [3727] DETAIL:  parameters: $1 = '4c0686b5-3f9b-4d40-a815-5693b23be08e'
db-1  | 2024-05-31 09:17:15.664 UTC [3727] LOG:  statement: select * from users
db-1  | 2024-05-31 09:17:15.674 UTC [3791] LOG:  execute <unnamed>: BEGIN ISOLATION LEVEL READ COMMITTED
db-1  | 2024-05-31 09:17:15.675 UTC [3791] LOG:  execute <unnamed>: SELECT admin_flags.id AS admin_flags_id, admin_flags.description AS admin_flags_description, admin_flags.enabled AS admin_flags_enabled, admin_flags.notify AS admin_flags_notify 
db-1  | 	FROM admin_flags 
db-1  | 	WHERE admin_flags.id = $1::VARCHAR
db-1  | 2024-05-31 09:17:15.675 UTC [3791] DETAIL:  parameters: $1 = 'read-only'
db-1  | 2024-05-31 09:17:15.677 UTC [3791] LOG:  statement: select * from users
db-1  | 2024-05-31 09:17:15.687 UTC [3791] LOG:  execute <unnamed>: SELECT users.username AS users_username, users.name AS users_name, users.password AS users_password, users.password_date AS users_password_date, users.is_active AS users_is_active, users.is_frozen AS users_is_frozen, users.is_superuser AS users_is_superuser, users.is_moderator AS users_is_moderator, users.is_psf_staff AS users_is_psf_staff, users.is_observer AS users_is_observer, users.prohibit_password_reset AS users_prohibit_password_reset, users.hide_avatar AS users_hide_avatar, users.date_joined AS users_date_joined, users.last_login AS users_last_login, users.disabled_for AS users_disabled_for, users.totp_secret AS users_totp_secret, users.last_totp_value AS users_last_totp_value, users.sitemap_bucket AS users_sitemap_bucket, users.observer_association_id AS users_observer_association_id, users.id AS users_id, user_emails_1.id AS user_emails_1_id, user_emails_1.user_id AS user_emails_1_user_id, user_emails_1.email AS user_emails_1_email, user_emails_1."primary" AS user_emails_1_primary, user_emails_1.verified AS user_emails_1_verified, user_emails_1.public AS user_emails_1_public, user_emails_1.unverify_reason AS user_emails_1_unverify_reason, user_emails_1.transient_bounces AS user_emails_1_transient_bounces 
db-1  | 	FROM users LEFT OUTER JOIN user_emails AS user_emails_1 ON users.id = user_emails_1.user_id 
db-1  | 	WHERE users.username = $1
db-1  | 2024-05-31 09:17:15.687 UTC [3791] DETAIL:  parameters: $1 = 'kvtYJaeoytrC'
db-1  | 2024-05-31 09:17:15.932 UTC [3791] LOG:  execute <unnamed>: ROLLBACK
db-1  | 2024-05-31 09:17:16.033 UTC [3727] LOG:  statement: ROLLBACK TO SAVEPOINT sa_savepoint_1
db-1  | 2024-05-31 09:17:16.034 UTC [3727] LOG:  execute <unnamed>: ROLLBACK
db-1  | 2024-05-31 09:17:16.040 UTC [3799] LOG:  statement: ALTER DATABASE "tests" with allow_connections false;
db-1  | 2024-05-31 09:17:16.046 UTC [3799] LOG:  execute <unnamed>: SELECT pg_terminate_backend(pg_stat_activity.pid)FROM pg_stat_activity WHERE pg_stat_activity.datname = $1;
db-1  | 2024-05-31 09:17:16.046 UTC [3799] DETAIL:  parameters: $1 = 'tests'
db-1  | 2024-05-31 09:17:16.047 UTC [3799] LOG:  statement: DROP DATABASE IF EXISTS "tests";

benjaoming avatar May 31 '24 10:05 benjaoming

I could be wrong, but I think CI failures like https://github.com/pypi/warehouse/actions/runs/11057423074/job/30721387457?pr=16778 indicate that the current approach isn't 100% right -- I haven't fully triaged it, but from what I can tell:

  1. In the integration tests, everything uses a transaction.TransactionManager instead of pyramid_tm. This gets plumbed in by overriding the default TM in TestApp.
  2. However, the "factory" helpers (e.g. ProjectFactory) manage their own sessions, which get pulled directly from a scoped_session that isn't tied to the custom TM.

For reference, here's the Session factory that the test model factories use:

https://github.com/pypi/warehouse/blob/684ff39c4b82a134ac1eb3ae4fd46e37fb13904f/tests/common/db/init.py#L17

and here's where it gets loaded into each model factory, via Meta:

https://github.com/pypi/warehouse/blob/684ff39c4b82a134ac1eb3ae4fd46e37fb13904f/tests/common/db/base.py#L18-L21

If I understand the lifetimes correctly here, the end result is that calls like ProjectFactory.create() are tied to session that isn't actually tracked by the integration test TM.

I can't tell if this is the actual root cause, but it seems like a decent candidate for the strange test behavior I've seen (e.g. where tests sometimes pass, sometimes fail because the models appear to not be updated, sometimes fail because a route that traverses them 404s instead of 200ing).

I think the fix for the above is to somehow bind these model factories' sessions to the custom TM, but I'm not sure how best to do that.

woodruffw avatar Sep 26 '24 21:09 woodruffw

@woodruffw that might indeed be the case. Here's something that I load in my Pytest db session fixtures:

@pytest.fixture
def db_session(...):

    ...

    # Ensure that all factories use the same session
    # From: https://github.com/tiangolo/sqlmodel/discussions/615
    for factory in SQLAlchemyModelFactory.__subclasses__():
        factory._meta.sqlalchemy_session = session

benjaoming avatar Sep 27 '24 14:09 benjaoming

Nice, thanks @benjaoming! I was trying to get something similar with patching _meta.sqlalchemy_session working, but I was overthinking it.

woodruffw avatar Sep 27 '24 14:09 woodruffw

Just sharing someone else's idea - open source ftw :tada:

benjaoming avatar Sep 27 '24 16:09 benjaoming