fuzzbench
fuzzbench copied to clipboard
Rollback transactions properly
During the halloweeen experiment and sometime a few months ago (when @inferno-chromium upgraded the size of the db instance), we had to manually intervene in the db because Fuzzbench could not properly recover from a failed transaction. We need to fix this. I'm about >50% this is an issue that can/should be fixed. I'm not 100% on the details but want to mention it to ensure it doesn't get lost.
Keeping stacktrace for future fixes
Traceback (most recent call last):
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
conn = self._revalidate_connection()
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 470, in _revalidate_connection
"Can't reconnect until invalid "
sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/work/src/experiment/measurer/measure_manager.py", line 99, in measure_loop
all_trials_ended = scheduler.all_trials_ended(experiment)
File "/work/src/experiment/scheduler.py", line 104, in all_trials_ended
models.Trial.time_ended.is_(None)).all()
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3346, in all
return list(self)
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3508, in __iter__
return self._execute_and_instances(context)
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3533, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1207, in _execute_context
e, util.text_type(statement), parameters, None, None
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
conn = self._revalidate_connection()
File "/work/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 470, in _revalidate_connection
"Can't reconnect until invalid "
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT trial.id AS trial_id, trial.fuzzer AS trial_fuzzer, trial.experiment AS trial_experiment, trial.benchmark AS trial_benchmark, trial.time_started AS trial_time_started, trial.time_ended AS trial_time_ended, trial.preemptible AS trial_preemptible, trial.preempted AS trial_preempted
FROM trial
WHERE trial.experiment = %(experiment_1)s AND trial.time_ended IS NULL ORDER BY trial.id]
[parameters: [immutabledict({})]]
"
basically dispatcher goes in bad state, unable to stop instances etc. only option remains to kill experiments.