The celery worker can stuck after a SQL error
Overview
We have already fixed this problem for the app:
@app.errorhandler(sqlalchemy.exc.SQLAlchemyError)
def error_handler(err):
# To prevent session from break because of unhandled error with no rollback
# https://github.com/frictionlessdata/goodtables.io/issues/97
log.info('Database session rollback by server error handler')
database['session'].rollback()
raise err
but it seems we run at the same problem for our workers:
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
Quickfix
Restart goodtables.io pod:
kubectl get pod goodtables-io-production-xxxxxxxx-xxxx -n production -o yaml | kubectl
replace --force -f -
Logs
Note the restart at 2018-08-01 8:29 GMT and errors above
https://console.cloud.google.com/logs/viewer?interval=PT1H&project=oki-cloud&minLogLevel=0&expandAll=false×tamp=2018-08-01T08:42:22.153000000Z&customFacets=&limitCustomFacetWidth=true&advancedFilter=resource.type%3D%22container%22%0Aresource.labels.cluster_name%3D%22oki%22%0Aresource.labels.namespace_id%3D%22production%22%0Aresource.labels.project_id%3D%22oki-cloud%22%0Aresource.labels.zone:%22europe-west1-b%22%0Aresource.labels.container_name%3D%22goodtables-worker%22&scrollTimestamp=2018-08-01T08:29:35.000000000Z&dateRangeStart=2018-08-01T07:47:17.427Z&dateRangeEnd=2018-08-01T08:47:17.427Z
cc @amercader @brew
Should be related to #315
The Can't reconnect until invalid transaction is rolled back problem has happened again.
Similar issue being experienced with service. From the logs:
[2019-02-25 11:49:08,161: ERROR/MainProcess] Task handler raised error: StatementError("(sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back",)
Pod restarted on k8s infrastructure, as per command above, and working again for now.