knowledge-repo
knowledge-repo copied to clipboard
Posts don't show and can't be accessed when using MySQL DB
Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj
I'm trying to deploy knowledge repo with a database backend. The database is running on a different place than the knowledge_repo server and is accessed over an SSH tunnel.
It seems that Knowledge Repo can read from and write to this database just fine.
However, for whatever reason, no posts are showing up. And it's not just in the feed, but they also can't be accessed directly via their path. I looked in the database itself and the posts are there in the repository table, however, the posts
table that knowledge repo makes itself is empty. If I try to re-add the posts I see in the repository table, I expectedly get the error that their paths already exist.
I tried also setting SQLALCHEMY_DATABASE_URI to use sqlite, but that didn't seem to help.
Another thing I tried was running a MySQL DB on a local docker container, and the same issue persists.
Version 0.8.1 of Knowledge Repo, on a Mac, with python 3 if that helps.
I'm sort of at a loss as to why this could be happening.
In the logs, I see stuff like this:
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT index_metadata.id AS index_metadata_id, index_metadata.type AS index_metadata_type, index_metadata.name AS index_metadata_name, index_metadata.value AS index_metadata_value, index_metadata.updated_at AS index_metadata_updated_at \nFROM index_metadata \nWHERE index_metadata.type = %s AND index_metadata.name = %s \n LIMIT %s'] [parameters: ('lock', 'master_check', 1)] (Background on this error at: http://sqlalche.me/e/e3q8)
During startup of knowledge repo I also see this:
WARNING:knowledge_repo.app.index:Master indexing thread has died. Restarting...
Looks like the errors go away when I switch to MySQL 5.6, rather than the original 5.7 I was using.
However, indexing still does not seem to take place and posts aren't showing up.
It looks like although the Lost Connection errors with runserver
go away, they come back when I use deploy
.
I tried switching to postgres DB and I get lost connection errors with that too server closed the connection unexpectedly
It's strange that this is only happening with the indexing query and other queries seem to be able to read and write into the DB. I can run the indexing query in something like SQLWorkbench/J without any problems. I wonder if that means it has something to do with how knowledge repo handles the DB connections.
👍 - I have a similar problem when running gunicorn
driven Knowledge Repo server against a Postgres instance.
web_1 | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
web_1 | [SQL: 'SELECT knowledge_repo.index_metadata.id AS knowledge_repo_index_metadata_id, knowledge_repo.index_metadata.type AS knowledge_repo_index_metadata_type, knowledge_repo.index_metadata.name AS knowledge_repo_index_metadata_na
me, knowledge_repo.index_metadata.value AS knowledge_repo_index_metadata_value, knowledge_repo.index_metadata.updated_at AS knowledge_repo_index_metadata_updated_at \nFROM knowledge_repo.index_metadata \nWHERE knowledge_repo.index_metadat
a.type = %(type_1)s AND knowledge_repo.index_metadata.name = %(name_1)s \n LIMIT %(param_1)s'] [parameters: {'type_1': 'lock', 'name_1': 'master_check', 'param_1': 1}]
I explored this some more, and it's actually a combination of minor issues and lack of documentation.
-
When using
deploy
with MySQL 5.6, indexing fails with the Lost Connection messages. However, usingrunserver
works. The catch is to make it work you have to make sure to delete any data in theindex_metadata
table that may have been left behind bydeploy
. -
Indexing also works with
deploy
when setting--workers 0
, and indexing also works when triggered manually viaknowledge_repo
's reindex command. This suggests that there is some issue specifically with the multiple workers. So a good workaround is to deploy normally and turn off automatic indexing and do it manually via a cron job that callsreindex
. -
You also have to change the post status to
3
in the database (not sure how to do this in the UI or via the command line.submit
only gets status to1
). I see no options inknowledge_repo
script, or direction in the webapp about how to 'review' posts submitted for review and change their status. Theaccept()
andpublish()
etc methods seem to not be called by the current version of the web app or knowledge_repo script except specifically for webposts. One option is to manually navigate tohttps://host:port/edit/project_name/path_to_post.kp
but this requires thatproject_name
posts are in the list of allowed posts to edit online (this can be specified in config, and onlywebposts
is allowed by default). -
There is incompatibility with MySQL 5.7 changes to Strict mode. This causes indexing when using
runserver
(and alsodeploy
) to fail (unlike MySQL 5.6, where it works withrunserver
)
Basically this is not currently ready to run reliably with a DB backend, but it also seems like it's pretty close.
So, I removed MySQLdb and changed over to mysqlconnect and now deploy works but runserver has issues with the query.
In addition, the code that checks if a repository is up to date doesn't work. After initial run of deploy it seems to always return false, leading to new posts published via the web UI tool to not show up unless the revision data is manually cleared from the index table.
Part of it is because the value in the index_metadata table that tracks the index master doesn't always reset when the app is restarted. And there's some issue with the repo version too - i set it to print out values and it seems like when I made a post, the version updated to the date of the post, but then reverted back to the previous date.
The DB Backend setup really needs a lot more testing it seems. Any plans to update it? Or maybe provide any insight here on how to go about fixing these issues?
Next, I updated the revision
function in dbrepository to this:
def revision(self):
result = self.session.query(func.max(self.PostRef.updated_at)).first()
self.session.commit()
return str(result[0])
It just adds a session.commit(), which takes care of the problem with the repo versions reverting back to what they were, and the indexing failing to create paths to newly published posts. Apparently without the .commit() it was always returning the initial value of the query, and not sure how long that cached value takes to expire.
Also not sure if this approach introduces any other problems.
Not sure on the best way to deal with the leftover data in the index_metadata
table after closing the server app, which sometimes (but not always) causes the indexing to not work. But it's easy enough to clear the data table before running the server every time.
Almost ready to deploy :)
Thanks for sharing your experience @CPapadim . To add a little more context to this scenario, before deploying with Postgres
and gunicorn
, I have not seen any indexing issues when running a deploy
with flask
backed in-memory instance of sqlite
, which seems to suggest that the issue indeed may have to do with multiple workers, as flask
and sqlite
are single threaded.
I'll give a shot to your change of revision
function and report back if that fixes the problems on my side. Thanks again for documenting your debugging efforts!
Update: No changes.
I suspect all of the issues with indexing are just improperly handled connections and different packages and databases handling different situations gracefully and others not so gracefully. Probably going through the code and closing and reinitializing connections every time will solve the issues, though not sure what sort of performance overhead it might introduce. Would love the authors to at least weigh in on approaches since I don't have a good grasp of the codebase of knowledge repo yet.
The fix I posted above though goes a long way - it fixes the runserver indexing error I get with mysqlconnect package also, meaning that at least for me, the combination of:
- mysqlconnect (instead of MySQLdb)
- MySQL 5.6
- changes to the
revisions()
function I posted
solve all of the errors except the issue with bad leftover data in index_metadata table.
Exciting times :)
Hi @CPapadim. Thanks for being so detailed in your write-up about all of this, and sorry for the huge delay in my response, and for the difficulty you've had setting it all up.
I've mainly tested the indexing code with local and Amazon RDS MySQL instances, and never run into these problems; but I know that there are a large number of moving parts, and so am not surprised that something has gone wrong in some configurations. The library I've been using to connect to MySQL/MariaDB has been provided by the mysqlclient
package, which is a compatible fork of MySQLdb that improves on several things (and makes it Python 3 compatible). I'd be interested to know if using this library solves your issues.
I currently don't have a huge amount of time to put into exploring different configurations and what could be going wrong, but one easy thing worth exploring might be to simply explicitly reset the database session immediately after forking the process (just in case that is not done already, and different processes are trying to use the same session) by adding something like db_session.close()
just after https://github.com/airbnb/knowledge-repo/blob/master/knowledge_repo/app/index.py#L40 . If you have some time to try it out, I'd also be interested to see if that solves your problem.
If neither of these things work for you, I'll try to find some time to dig through it. My first hurdle will be reproducing your problems, since, as I noted, I've not seen these issues before.
As far as reproducibility, I think we're not that far off. I'm also using Python 3 and AWS RDS instances - though with the exception that I'm also using an SSH tunnel to them, which I don't think is responsible for any of the issues and can likely be ignored.
I think I was using the package you mentioned - for python 3 MySQLdb is part of the mysqlclient package, though I'm not sure if I installed it from Pypi or directly from github, which could have lead to the issues. I can test again when I get a chance and see. This library caused the initial issues I mentioned that were fixed when switching over to mysqlconnect - which did introduce other issues.
Roger that. Hopefully I'll get some time to spend on debugging some of these issues, but I'm not optimistic that I will find time for at least another couple of weeks. Let me know if you make any progress on your end :).
I tried the suggestion to put db_session.close()
as suggested, and using the mysqlclient
package, and I still get an error while indexing. So right now I can only get indexing to work with the mysqlconnect
package, but that package is very slow for me.
Would be great if you could look into this or offer suggestions - I fear at this rate I may have to abandon the project to deploy KR. As awesome as the features are, I haven't had luck getting it to stand up.
Hey @matthewwardrop can you check your sqlalchemy
version? I did some tests, and indexing partly works with sqlalchemy 1.1.x
versions (and previous versions also), whereas all 1.2.X
versions fail. Looks like something may have changed in 1.2.x that breaks KR.
The latest version that works is 1.1.18
which is the latest in the 1.1.x
line
So maybe upgrading the sqlalchemy version will allow you to reproduce the issues and fix them, or at least add sqlalchemy 1.1.x as a requirement.
When I say 'partly' works though, it works as long as no other queries are run. But for example when a user visits the KR page, triggering a query to list the posts, or other similar actions, indexing will then again start to fail. Basically, any queries that pull content will then break indexing. Then all subsequent indexing attempts will fail.
That's very interesting behavior. Querying the DB for posts seems to continue to work fine, but specifically querying for indexing starts failing after the first time posts are queried, or some other query happens. I wonder if the connection of the get_posts() query and other queries are somehow colliding with the indexing thread connection because they are in separate threads and not aware of each other.
Finally some progress!
It appears db.session.close()
doesn't actually close connections when using a pool, just returns them to the pool. Then they somehow collide with connections from other queries like the one in get_posts
when using multiprocessing.
I added db.engine.dispose()
at the start of all models.py IndexMetadata()
methods, which moved the error to dbrepository.py revision()
method. I then added self.engine.dispose()
at the start of that method too, and now indexing seems to work, including with the latest version of sqlalchemy.
As far as I can tell those are the only places where db connections are made during indexing.
While indexing works, I haven't yet tested to see if this broke anything else.
Implemented the proposed changes on our local copy of knowledge server and also saw the resolution - indexing now works great! Have not seen any issues arise since then. @CPapadim should these changes be wrapped up into a PR and merged into master?
Thanks again @CPapadim for spending so much time fine-tuning this. The changes look good to me, and I've merged them all in. I'll release a new version soon. Thanks @irdvl also for testing!
@CPapadim So... I had to revert this because (of course) it breaks SQLite in memory databases, where the database is thrown away when you dispose of the engine. 🤕
I'm testing out some similar ideas and will put up a PR soon to see if it solves your issues too.
I think this is resolved by #457. Can anyone else confirm that it solves their issues? I'll close this issue assuming that it has based on my testing, but if it has not, please reopen. Thanks again everyone!
Sorry to reopen this thread, but it seems that this is still an issue with the latest release of Knowledge Repo.
I've put together a repository using docker-compose that should make it easy to reproduce: https://github.com/gthomas-slack/knowledge-repo-docker
Steps to get it running:
- Install Docker (https://www.docker.com/)
- git clone https://github.com/gthomas-slack/knowledge-repo-docker.git
-
make build
-- to build the knowledge-repo docker image -
make run
-- to run the docker-compose - To stop the docker containers use
ctrl-c
- To reset to a clean state, including deleting mysql data use
make clean
The webserver should become accessible on http://localhost:7000 however you will see that as the server tries to index the git repo, it encounters mysql errors that seem to be related to threads/concurrency as described earlier in this thread.
knowledge-app | git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
knowledge-app | cmdline: git checkout master
knowledge-app | stderr: 'fatal: Unable to create '/app/knowledge-example/.git/index.lock': File exists.
knowledge-app |
knowledge-app | Another git process seems to be running in this repository, e.g.
knowledge-app | an editor opened by 'git commit'. Please make sure all processes
knowledge-app | are terminated then try again. If it still fails, a git process
knowledge-app | may have crashed in this repository earlier:
knowledge-app | remove the file manually to continue.'
knowledge-app | File "/usr/local/lib/python3.5/site-packages/MySQLdb/connections.py", line 224, in query
knowledge-app | _mysql.connection.query(self, query)
knowledge-app | sqlalchemy.exc.OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
knowledge-app | (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
knowledge-app | [SQL: UPDATE posts SET uuid=%s, path=%s, project=%s, repository=%s, revision=%s, title=%s, subtitle=%s, tldr=%s WHERE posts.id = %s]
knowledge-app | [parameters: (b'a1ca3d52-9553-4bd9-8278-3bd0a0a7f193', 'project/example_rmd.kp', None, '/app/knowledge-example', 1, 'This is a Knowledge Template Header', None, 'This is short description of the content and findings of the post.', 2)]
knowledge-app | (Background on this error at: http://sqlalche.me/e/e3q8)
knowledge-app |
when trying to access http://localhost:7000 :
knowledge-app | File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection
knowledge-app | "Can't reconnect until invalid "
knowledge-app | sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
knowledge-app | [SQL: SELECT index_metadata.id AS index_metadata_id, index_metadata.type AS index_metadata_type, index_metadata.name AS index_metadata_name, index_metadata.value AS index_metadata_value, index_metadata.updated_at AS index_metadata_updated_at
knowledge-app | FROM index_metadata
knowledge-app | WHERE index_metadata.type = %s AND index_metadata.name = %s
knowledge-app | LIMIT %s]
knowledge-app | [parameters: [{}]]
I also tried the patches suggested in this thread: https://github.com/airbnb/knowledge-repo/compare/master...gthomas-slack:master
This branch alters the docker image to use those patches: https://github.com/gthomas-slack/knowledge-repo-docker/tree/with-db-patches
However it still fails with similar errors.
Hi @gthomas-slack - sorry for our delayed response here and thank you for your well-prepared bug report. This tooling is largely "as is" as the main developers have moved on to companies where we don't use Knowledge Repo.
I'm able to reproduce and it looks like there's some combination of an unreleased-lock and perhaps a stampeding heard problem going on. But ultimately after the server starts, those errors seem to calm down as the workers de-synchronize and the webserver is able to serve existing posts. However, index updating is still blocked. Is this your experience as well?
cc @victor-vazquez @matthewwardrop
Hi @naoyak - @robjwang encouraged me to give this another try to see if things have changed / improved. I used the same docker setup referenced above, except switched it to use the latest master
from this repo. Unfortunately I still experienced the same issues :(
It does seem that the issue is related to multiple concurrent workers. If I set it to spin up with --workers 1
I can actually get into the app (though it responds slowly). With anything more than 1 worker the web ui never seems to become available for me.
Let me know if you experience the same issue.
@bulam can you help here?
hi @gthomas-slack I looked into this last week and this is the summary of what I found:
- KR uses
gunicorn
for distributing processing across multiple workers. When using a SQL db backend the gunicorn workers try to access the db simultaneously. However, each attempt locks the other attempts out, typically blocking SQL db access altogether. - The same problem happens when the KR tries to autoindex with
git pull
. The gunicorn workers complain about multiple git processes running simultaneously .
To address these issues, I'm currently putting together a docker compose file with a SQL db backend that does the following (this is actually how we have it implemented internally at Airbnb as well):
- Disable the KR built-in auto-indexing to avoid the gunicorn multi worker concurrency issues
- Instead, use a cron job that runs
git pull
&knowledge_repo reindex
on a regular schedule (e.g. every X mins) - Additionally, include a webhook to a git repo that triggers a reindex whenever a new PR is merged.
I have this working and just working on testing a bit more and cleaning up the code. Planning to share by end of week
Thanks for investigating @bulam ! Let me know if/when you would like some help testing/validating the fixes, happy to try it out from my side.
Hey @bulam , have you already shared your solution somewhere? I am currently using the KR with a mysql DB backend. I didn't manage to fix the indexing issues with the additions that @CPapadim provided, but rather by removing/commenting the short-circuit condition in knowledge_repo -> app -> index.py:
# Short-circuit if necessary
# if not force and (not is_index_master or index_up_to_date()):
# return False
Maybe that's useful for somebody.
@doepking sorry got busy and haven't gotten a chance to wrap up the image yet. Hoping to spend some time on this soon
@bulam - Does Airbnb's deployed of KR use Gunicorn?
I have a suspicion that the root cause of these various problems is because the connection pool is being instantiated before the gunicorn processes are forked. This results in connections attempting to be shared across processes which causes a bunch of problems. https://davidcaron.dev/sqlalchemy-multiple-threads-and-processes/
I'm not super familiar with how the bin/knowledge_repo
works but it seems like some changes are necessary there to ensure that the process fork happens before the database pool is stood up.
I think this is happening because I'm able to replicate @gthomas-slack 's issues locally, even without running the re-index job. Thus, the only thing running is the webserver, and yet I'm see still seeing errors like:
[2021-03-15 15:46:39 +0000] [98] [ERROR] Error handling request /static/modules/bootstrap-slider/js/bootstrap-slider.min.js
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 779, in _getter
getter = self._metadata._getter
AttributeError: 'NoneType' object has no attribute '_getter'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1945, in full_dispatch_request
self.try_trigger_before_first_request_functions()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions
func()
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 202, in ensure_excluded_tags_exist
Tag(name=tag)
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 45, in __new__
return unique_cache(
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 19, in unique_cache
obj = q.first()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
return list(res)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 100, in instances
cursor.close()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 58, in instances
*[
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 59, in <listcomp>
query_entity.row_processor(query, context, cursor)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4422, in row_processor
_instance = loading._instance_processor(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 421, in _instance_processor
getter = result._getter(col, False)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 781, in _getter
return self._non_result(None, err)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1236, in _non_result
util.raise_(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1879, in handle_exception
server_error = handler(server_error)
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 191, in show_traceback
return render_template('error.html')
File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 136, in render_template
ctx.app.update_template_context(context)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 838, in update_template_context
context.update(func())
File "/usr/local/lib/python3.8/site-packages/flask_login/utils.py", line 379, in _user_context_processor
return dict(current_user=_get_user())
File "/usr/local/lib/python3.8/site-packages/flask_login/utils.py", line 346, in _get_user
current_app.login_manager._load_user()
File "/usr/local/lib/python3.8/site-packages/flask_login/login_manager.py", line 331, in _load_user
user = self._load_user_from_request(request)
File "/usr/local/lib/python3.8/site-packages/flask_login/login_manager.py", line 390, in _load_user_from_request
user = self._request_callback(request)
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 120, in load_user_from_request
user = User(identifier=user_attributes['identifier'])
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 45, in __new__
return unique_cache(
File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 19, in unique_cache
obj = q.first()
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
return list(res)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2006, 'MySQL server has gone away')
This traceback leads me to believe that multiple processes are trying to use the same connection. I think the second condition in this issue applies
Leaving an update for future folks experiencing this:
We added to our server_config.py
the following lines:
SQLALCHEMY_ENGINE_OPTIONS = {
'poolclass': NullPool,
}
This forces SQLAlchemy to use a NullPool
which implies "use a brand new connection every time you do anything with the database". We've stopped seeing the tracebacks posted above and getting failures to connect to the service.
As a note, there's an issue with Flask-SQLAlchemy that prevents NullPool
from being set correctly. We've deployed using this fork of Flask-SQLAlchemy.