augur icon indicating copy to clipboard operation
augur copied to clipboard

frontend wont open when failed transaction not rolled back: sqlalchemy.exc.PendingRollbackError

Open MoralCode opened this issue 1 month ago • 4 comments

I encountered a weird exception when trying to load the frontend pages today. I got a 500 error before anything showed up on screen, even when just trying to get to the augur homepage.

It seems like this may have been caused by another exception that was not cleaned up/rolled back in the catch block.

We should check through our database calls to see if we can find all the places where we may not be correctly rolling back DB transactions on exceptions.

stack trace:

2025-10-30 16:24:38 f26c37058939 augur.api.server[406] ERROR Exception on / [GET]
augur-1  | Traceback (most recent call last):
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask/app.py", line 2073, in wsgi_app
augur-1  |     response = self.full_dispatch_request()
augur-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask/app.py", line 1518, in full_dispatch_request
augur-1  |     rv = self.handle_user_exception(e)
augur-1  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
augur-1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
augur-1  |                                                 ^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask/app.py", line 1516, in full_dispatch_request
augur-1  |     rv = self.dispatch_request()
augur-1  |          ^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask/app.py", line 1502, in dispatch_request
augur-1  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/augur/api/view/routes.py", line 76, in repo_table_view
augur-1  |     if current_user.is_authenticated:
augur-1  |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/werkzeug/local.py", line 436, in __get__
augur-1  |     obj = instance._get_current_object()
augur-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/werkzeug/local.py", line 565, in _get_current_object
augur-1  |     return self.__local()  # type: ignore
augur-1  |            ^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask_login/utils.py", line 26, in <lambda>
augur-1  |     current_user = LocalProxy(lambda: _get_user())
augur-1  |                                       ^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask_login/utils.py", line 346, in _get_user
augur-1  |     current_app.login_manager._load_user()
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask_login/login_manager.py", line 331, in _load_user
augur-1  |     user = self._load_user_from_request(request)
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/flask_login/login_manager.py", line 390, in _load_user_from_request
augur-1  |     user = self._request_callback(request)
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/augur/api/view/augur_view.py", line 103, in load_user_request
augur-1  |     token = db_session.query(UserSessionToken).filter(UserSessionToken.token == token, UserSessionToken.expiration >= current_time).first()
augur-1  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2748, in first
augur-1  |     return self.limit(1)._iter().first()  # type: ignore
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2847, in _iter
augur-1  |     result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
augur-1  |                                                   ^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2306, in execute
augur-1  |     return self._execute_internal(
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2188, in _execute_internal
augur-1  |     result: Result[Any] = compile_state_cls.orm_execute_statement(
augur-1  |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
augur-1  |     result = conn.execute(
augur-1  |              ^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1416, in execute
augur-1  |     return meth(
augur-1  |            ^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
augur-1  |     return connection._execute_clauseelement(
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_clauseelement
augur-1  |     ret = self._execute_context(
augur-1  |           ^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1812, in _execute_context
augur-1  |     conn = self._revalidate_connection()
augur-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 669, in _revalidate_connection
augur-1  |     self._invalid_transaction()
augur-1  |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 659, in _invalid_transaction
augur-1  |     raise exc.PendingRollbackError(
augur-1  | sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back.  Please rollback() fully before proceeding (Background on this error at: https://sqlalche.me/e/20/8s2b)

MoralCode avatar Oct 30 '25 21:10 MoralCode

Could it be that the global db_session created in augur/api/server.py is never cleaned up between requests? The same session is used by login authentication callbacks and multiple view routes.

So, maybe when a database exception occurs anywhere in the app, this global session enters an invalid transaction state. And since there's no cleanup between requests, subsequent requests that try to use db_session (like the homepage auth check) hit the PendingRollbackError. Thoughts?

shlokgilda avatar Nov 17 '25 15:11 shlokgilda

no, i doubt it. the PendingRollbackError message basically suggests that the conditions for this bug appear to be:

  1. a database transaction is in progress
  2. it gets interrupted in such a way that the transaction is never finished, but also never gets rolled back and the session is kept open (possibly by postgres???)
  3. something else tries to connect to the DB and postgres doesnt let it because its waiting for the invalid/incomplete transaction to be rolled back

MoralCode avatar Nov 18 '25 20:11 MoralCode

This very easily could have just been a one-off issue though since the reproduction steps are really vague and hard to repeat

MoralCode avatar Nov 18 '25 20:11 MoralCode

I think I managed to reproduce something very similar to this issue (though not the exact same error). Let me know if this makes sense. I was able to trigger a related error by forcing a database exception:

  1. Started Gunicorn with a single worker (-w 1) to ensure requests hit the same session
  2. Added a temporary test route to trigger a DB error:
@app.route('/kill-db')
def kill_db():
    db_session.execute(text("SELECT 1/0"))  # forces division by zero
    return "BOOOM!!!"
  1. Hit /kill-db → got 500 Internal Server Error with psycopg2.errors.DivisionByZero
  2. Immediately hit / (homepage) → also got 500 with this error: sqlalchemy.exc.InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

The error happened in load_user_request when trying to do db_session.query(UserSessionToken)...

So this is InFailedSqlTransaction rather than PendingRollbackError, but I think they're essentially the same problem (not sure; might be mistaken?) But I think the session got into a broken state from a previous error and was never rolled back. The session then persists across requests (same worker) and breaks subsequent queries.

shlokgilda avatar Nov 18 '25 23:11 shlokgilda