fuzzbench icon indicating copy to clipboard operation
fuzzbench copied to clipboard

Rollback transactions properly

Open jonathanmetzman opened this issue 5 years ago • 1 comments

During the halloweeen experiment and sometime a few months ago (when @inferno-chromium upgraded the size of the db instance), we had to manually intervene in the db because Fuzzbench could not properly recover from a failed transaction. We need to fix this. I'm about >50% this is an issue that can/should be fixed. I'm not 100% on the details but want to mention it to ensure it doesn't get lost.

jonathanmetzman avatar Nov 05 '20 14:11 jonathanmetzman

Keeping stacktrace for future fixes

Traceback (most recent call last):
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
    conn = self._revalidate_connection()
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 470, in _revalidate_connection
    "Can't reconnect until invalid "
sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/src/experiment/measurer/measure_manager.py", line 99, in measure_loop
    all_trials_ended = scheduler.all_trials_ended(experiment)
  File "/work/src/experiment/scheduler.py", line 104, in all_trials_ended
    models.Trial.time_ended.is_(None)).all()
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3346, in all
    return list(self)
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3508, in __iter__
    return self._execute_and_instances(context)
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3533, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
    distilled_params,
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1207, in _execute_context
    e, util.text_type(statement), parameters, None, None
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
    conn = self._revalidate_connection()
  File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 470, in _revalidate_connection
    "Can't reconnect until invalid "
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT trial.id AS trial_id, trial.fuzzer AS trial_fuzzer, trial.experiment AS trial_experiment, trial.benchmark AS trial_benchmark, trial.time_started AS trial_time_started, trial.time_ended AS trial_time_ended, trial.preemptible AS trial_preemptible, trial.preempted AS trial_preempted 
FROM trial 
WHERE trial.experiment = %(experiment_1)s AND trial.time_ended IS NULL ORDER BY trial.id]
[parameters: [immutabledict({})]]
" 

basically dispatcher goes in bad state, unable to stop instances etc. only option remains to kill experiments.

inferno-chromium avatar Nov 07 '20 15:11 inferno-chromium