knowledge-repo icon indicating copy to clipboard operation
knowledge-repo copied to clipboard

Posts don't show and can't be accessed when using MySQL DB

Open CPapadim opened this issue 6 years ago • 29 comments

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

I'm trying to deploy knowledge repo with a database backend. The database is running on a different place than the knowledge_repo server and is accessed over an SSH tunnel.

It seems that Knowledge Repo can read from and write to this database just fine.

However, for whatever reason, no posts are showing up. And it's not just in the feed, but they also can't be accessed directly via their path. I looked in the database itself and the posts are there in the repository table, however, the posts table that knowledge repo makes itself is empty. If I try to re-add the posts I see in the repository table, I expectedly get the error that their paths already exist.

I tried also setting SQLALCHEMY_DATABASE_URI to use sqlite, but that didn't seem to help.

Another thing I tried was running a MySQL DB on a local docker container, and the same issue persists.

Version 0.8.1 of Knowledge Repo, on a Mac, with python 3 if that helps.

I'm sort of at a loss as to why this could be happening.

In the logs, I see stuff like this:

    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT index_metadata.id AS index_metadata_id, index_metadata.type AS index_metadata_type, index_metadata.name AS index_metadata_name, index_metadata.value AS index_metadata_value, index_metadata.updated_at AS index_metadata_updated_at \nFROM index_metadata \nWHERE index_metadata.type = %s AND index_metadata.name = %s \n LIMIT %s'] [parameters: ('lock', 'master_check', 1)] (Background on this error at: http://sqlalche.me/e/e3q8)

During startup of knowledge repo I also see this:

WARNING:knowledge_repo.app.index:Master indexing thread has died. Restarting...

CPapadim avatar Jun 09 '18 05:06 CPapadim

Looks like the errors go away when I switch to MySQL 5.6, rather than the original 5.7 I was using.

However, indexing still does not seem to take place and posts aren't showing up.

CPapadim avatar Jun 11 '18 04:06 CPapadim

It looks like although the Lost Connection errors with runserver go away, they come back when I use deploy.

I tried switching to postgres DB and I get lost connection errors with that too server closed the connection unexpectedly

It's strange that this is only happening with the indexing query and other queries seem to be able to read and write into the DB. I can run the indexing query in something like SQLWorkbench/J without any problems. I wonder if that means it has something to do with how knowledge repo handles the DB connections.

CPapadim avatar Jun 11 '18 06:06 CPapadim

👍 - I have a similar problem when running gunicorn driven Knowledge Repo server against a Postgres instance.

web_1  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
web_1  |  [SQL: 'SELECT knowledge_repo.index_metadata.id AS knowledge_repo_index_metadata_id, knowledge_repo.index_metadata.type AS knowledge_repo_index_metadata_type, knowledge_repo.index_metadata.name AS knowledge_repo_index_metadata_na
me, knowledge_repo.index_metadata.value AS knowledge_repo_index_metadata_value, knowledge_repo.index_metadata.updated_at AS knowledge_repo_index_metadata_updated_at \nFROM knowledge_repo.index_metadata \nWHERE knowledge_repo.index_metadat
a.type = %(type_1)s AND knowledge_repo.index_metadata.name = %(name_1)s \n LIMIT %(param_1)s'] [parameters: {'type_1': 'lock', 'name_1': 'master_check', 'param_1': 1}]

irdvl avatar Jun 11 '18 22:06 irdvl

I explored this some more, and it's actually a combination of minor issues and lack of documentation.

  1. When using deploy with MySQL 5.6, indexing fails with the Lost Connection messages. However, using runserver works. The catch is to make it work you have to make sure to delete any data in the index_metadata table that may have been left behind by deploy.

  2. Indexing also works with deploy when setting --workers 0 , and indexing also works when triggered manually via knowledge_repo's reindex command. This suggests that there is some issue specifically with the multiple workers. So a good workaround is to deploy normally and turn off automatic indexing and do it manually via a cron job that calls reindex.

  3. You also have to change the post status to 3 in the database (not sure how to do this in the UI or via the command line. submit only gets status to 1). I see no options in knowledge_repo script, or direction in the webapp about how to 'review' posts submitted for review and change their status. The accept() and publish() etc methods seem to not be called by the current version of the web app or knowledge_repo script except specifically for webposts. One option is to manually navigate to https://host:port/edit/project_name/path_to_post.kp but this requires that project_name posts are in the list of allowed posts to edit online (this can be specified in config, and only webposts is allowed by default).

  4. There is incompatibility with MySQL 5.7 changes to Strict mode. This causes indexing when using runserver (and also deploy) to fail (unlike MySQL 5.6, where it works with runserver)

Basically this is not currently ready to run reliably with a DB backend, but it also seems like it's pretty close.

CPapadim avatar Jun 12 '18 06:06 CPapadim

So, I removed MySQLdb and changed over to mysqlconnect and now deploy works but runserver has issues with the query.

In addition, the code that checks if a repository is up to date doesn't work. After initial run of deploy it seems to always return false, leading to new posts published via the web UI tool to not show up unless the revision data is manually cleared from the index table.

Part of it is because the value in the index_metadata table that tracks the index master doesn't always reset when the app is restarted. And there's some issue with the repo version too - i set it to print out values and it seems like when I made a post, the version updated to the date of the post, but then reverted back to the previous date.

The DB Backend setup really needs a lot more testing it seems. Any plans to update it? Or maybe provide any insight here on how to go about fixing these issues?

CPapadim avatar Jun 19 '18 05:06 CPapadim

Next, I updated the revision function in dbrepository to this:

def revision(self):
        result = self.session.query(func.max(self.PostRef.updated_at)).first()
        self.session.commit()
        return str(result[0])

It just adds a session.commit(), which takes care of the problem with the repo versions reverting back to what they were, and the indexing failing to create paths to newly published posts. Apparently without the .commit() it was always returning the initial value of the query, and not sure how long that cached value takes to expire.

Also not sure if this approach introduces any other problems.

Not sure on the best way to deal with the leftover data in the index_metadata table after closing the server app, which sometimes (but not always) causes the indexing to not work. But it's easy enough to clear the data table before running the server every time.

Almost ready to deploy :)

CPapadim avatar Jun 19 '18 21:06 CPapadim

Thanks for sharing your experience @CPapadim . To add a little more context to this scenario, before deploying with Postgres and gunicorn, I have not seen any indexing issues when running a deploy with flask backed in-memory instance of sqlite, which seems to suggest that the issue indeed may have to do with multiple workers, as flask and sqlite are single threaded.

I'll give a shot to your change of revision function and report back if that fixes the problems on my side. Thanks again for documenting your debugging efforts!

Update: No changes.

irdvl avatar Jun 20 '18 00:06 irdvl

I suspect all of the issues with indexing are just improperly handled connections and different packages and databases handling different situations gracefully and others not so gracefully. Probably going through the code and closing and reinitializing connections every time will solve the issues, though not sure what sort of performance overhead it might introduce. Would love the authors to at least weigh in on approaches since I don't have a good grasp of the codebase of knowledge repo yet.

The fix I posted above though goes a long way - it fixes the runserver indexing error I get with mysqlconnect package also, meaning that at least for me, the combination of:

  1. mysqlconnect (instead of MySQLdb)
  2. MySQL 5.6
  3. changes to the revisions() function I posted

solve all of the errors except the issue with bad leftover data in index_metadata table.

Exciting times :)

CPapadim avatar Jun 20 '18 07:06 CPapadim

Hi @CPapadim. Thanks for being so detailed in your write-up about all of this, and sorry for the huge delay in my response, and for the difficulty you've had setting it all up.

I've mainly tested the indexing code with local and Amazon RDS MySQL instances, and never run into these problems; but I know that there are a large number of moving parts, and so am not surprised that something has gone wrong in some configurations. The library I've been using to connect to MySQL/MariaDB has been provided by the mysqlclient package, which is a compatible fork of MySQLdb that improves on several things (and makes it Python 3 compatible). I'd be interested to know if using this library solves your issues.

I currently don't have a huge amount of time to put into exploring different configurations and what could be going wrong, but one easy thing worth exploring might be to simply explicitly reset the database session immediately after forking the process (just in case that is not done already, and different processes are trying to use the same session) by adding something like db_session.close() just after https://github.com/airbnb/knowledge-repo/blob/master/knowledge_repo/app/index.py#L40 . If you have some time to try it out, I'd also be interested to see if that solves your problem.

If neither of these things work for you, I'll try to find some time to dig through it. My first hurdle will be reproducing your problems, since, as I noted, I've not seen these issues before.

matthewwardrop avatar Jul 08 '18 05:07 matthewwardrop

As far as reproducibility, I think we're not that far off. I'm also using Python 3 and AWS RDS instances - though with the exception that I'm also using an SSH tunnel to them, which I don't think is responsible for any of the issues and can likely be ignored.

I think I was using the package you mentioned - for python 3 MySQLdb is part of the mysqlclient package, though I'm not sure if I installed it from Pypi or directly from github, which could have lead to the issues. I can test again when I get a chance and see. This library caused the initial issues I mentioned that were fixed when switching over to mysqlconnect - which did introduce other issues.

CPapadim avatar Jul 08 '18 22:07 CPapadim

Roger that. Hopefully I'll get some time to spend on debugging some of these issues, but I'm not optimistic that I will find time for at least another couple of weeks. Let me know if you make any progress on your end :).

matthewwardrop avatar Jul 09 '18 16:07 matthewwardrop

I tried the suggestion to put db_session.close() as suggested, and using the mysqlclient package, and I still get an error while indexing. So right now I can only get indexing to work with the mysqlconnect package, but that package is very slow for me.

Would be great if you could look into this or offer suggestions - I fear at this rate I may have to abandon the project to deploy KR. As awesome as the features are, I haven't had luck getting it to stand up.

CPapadim avatar Jul 19 '18 23:07 CPapadim

Hey @matthewwardrop can you check your sqlalchemy version? I did some tests, and indexing partly works with sqlalchemy 1.1.x versions (and previous versions also), whereas all 1.2.X versions fail. Looks like something may have changed in 1.2.x that breaks KR.

The latest version that works is 1.1.18 which is the latest in the 1.1.x line

So maybe upgrading the sqlalchemy version will allow you to reproduce the issues and fix them, or at least add sqlalchemy 1.1.x as a requirement.

When I say 'partly' works though, it works as long as no other queries are run. But for example when a user visits the KR page, triggering a query to list the posts, or other similar actions, indexing will then again start to fail. Basically, any queries that pull content will then break indexing. Then all subsequent indexing attempts will fail.

That's very interesting behavior. Querying the DB for posts seems to continue to work fine, but specifically querying for indexing starts failing after the first time posts are queried, or some other query happens. I wonder if the connection of the get_posts() query and other queries are somehow colliding with the indexing thread connection because they are in separate threads and not aware of each other.

CPapadim avatar Jul 21 '18 08:07 CPapadim

Finally some progress!

It appears db.session.close() doesn't actually close connections when using a pool, just returns them to the pool. Then they somehow collide with connections from other queries like the one in get_posts when using multiprocessing.

I added db.engine.dispose() at the start of all models.py IndexMetadata() methods, which moved the error to dbrepository.py revision() method. I then added self.engine.dispose() at the start of that method too, and now indexing seems to work, including with the latest version of sqlalchemy.

As far as I can tell those are the only places where db connections are made during indexing.

While indexing works, I haven't yet tested to see if this broke anything else.

CPapadim avatar Jul 22 '18 00:07 CPapadim

Implemented the proposed changes on our local copy of knowledge server and also saw the resolution - indexing now works great! Have not seen any issues arise since then. @CPapadim should these changes be wrapped up into a PR and merged into master?

irdvl avatar Aug 22 '18 23:08 irdvl

Thanks again @CPapadim for spending so much time fine-tuning this. The changes look good to me, and I've merged them all in. I'll release a new version soon. Thanks @irdvl also for testing!

matthewwardrop avatar Sep 17 '18 20:09 matthewwardrop

@CPapadim So... I had to revert this because (of course) it breaks SQLite in memory databases, where the database is thrown away when you dispose of the engine. 🤕

I'm testing out some similar ideas and will put up a PR soon to see if it solves your issues too.

matthewwardrop avatar Sep 27 '18 22:09 matthewwardrop

I think this is resolved by #457. Can anyone else confirm that it solves their issues? I'll close this issue assuming that it has based on my testing, but if it has not, please reopen. Thanks again everyone!

matthewwardrop avatar Sep 28 '18 16:09 matthewwardrop

Sorry to reopen this thread, but it seems that this is still an issue with the latest release of Knowledge Repo.

I've put together a repository using docker-compose that should make it easy to reproduce: https://github.com/gthomas-slack/knowledge-repo-docker

Steps to get it running:

  • Install Docker (https://www.docker.com/)
  • git clone https://github.com/gthomas-slack/knowledge-repo-docker.git
  • make build -- to build the knowledge-repo docker image
  • make run -- to run the docker-compose
  • To stop the docker containers use ctrl-c
  • To reset to a clean state, including deleting mysql data use make clean

The webserver should become accessible on http://localhost:7000 however you will see that as the server tries to index the git repo, it encounters mysql errors that seem to be related to threads/concurrency as described earlier in this thread.

knowledge-app      | git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
knowledge-app      |   cmdline: git checkout master
knowledge-app      |   stderr: 'fatal: Unable to create '/app/knowledge-example/.git/index.lock': File exists.
knowledge-app      | 
knowledge-app      | Another git process seems to be running in this repository, e.g.
knowledge-app      | an editor opened by 'git commit'. Please make sure all processes
knowledge-app      | are terminated then try again. If it still fails, a git process
knowledge-app      | may have crashed in this repository earlier:
knowledge-app      | remove the file manually to continue.'
knowledge-app      |   File "/usr/local/lib/python3.5/site-packages/MySQLdb/connections.py", line 224, in query
knowledge-app      |     _mysql.connection.query(self, query)
knowledge-app      | sqlalchemy.exc.OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
knowledge-app      | (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
knowledge-app      | [SQL: UPDATE posts SET uuid=%s, path=%s, project=%s, repository=%s, revision=%s, title=%s, subtitle=%s, tldr=%s WHERE posts.id = %s]
knowledge-app      | [parameters: (b'a1ca3d52-9553-4bd9-8278-3bd0a0a7f193', 'project/example_rmd.kp', None, '/app/knowledge-example', 1, 'This is a Knowledge Template Header', None, 'This is short description of the content and findings of the post.', 2)]
knowledge-app      | (Background on this error at: http://sqlalche.me/e/e3q8)
knowledge-app      | 

when trying to access http://localhost:7000 :

knowledge-app      |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection
knowledge-app      |     "Can't reconnect until invalid "
knowledge-app      | sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
knowledge-app      | [SQL: SELECT index_metadata.id AS index_metadata_id, index_metadata.type AS index_metadata_type, index_metadata.name AS index_metadata_name, index_metadata.value AS index_metadata_value, index_metadata.updated_at AS index_metadata_updated_at 
knowledge-app      | FROM index_metadata 
knowledge-app      | WHERE index_metadata.type = %s AND index_metadata.name = %s 
knowledge-app      |  LIMIT %s]
knowledge-app      | [parameters: [{}]]

gthomas-slack avatar Jun 29 '19 02:06 gthomas-slack

I also tried the patches suggested in this thread: https://github.com/airbnb/knowledge-repo/compare/master...gthomas-slack:master

This branch alters the docker image to use those patches: https://github.com/gthomas-slack/knowledge-repo-docker/tree/with-db-patches

However it still fails with similar errors.

gthomas-slack avatar Jun 29 '19 02:06 gthomas-slack

Hi @gthomas-slack - sorry for our delayed response here and thank you for your well-prepared bug report. This tooling is largely "as is" as the main developers have moved on to companies where we don't use Knowledge Repo.

I'm able to reproduce and it looks like there's some combination of an unreleased-lock and perhaps a stampeding heard problem going on. But ultimately after the server starts, those errors seem to calm down as the workers de-synchronize and the webserver is able to serve existing posts. However, index updating is still blocked. Is this your experience as well?

cc @victor-vazquez @matthewwardrop

danfrankj avatar Jul 22 '19 17:07 danfrankj

Hi @naoyak - @robjwang encouraged me to give this another try to see if things have changed / improved. I used the same docker setup referenced above, except switched it to use the latest master from this repo. Unfortunately I still experienced the same issues :(

It does seem that the issue is related to multiple concurrent workers. If I set it to spin up with --workers 1 I can actually get into the app (though it responds slowly). With anything more than 1 worker the web ui never seems to become available for me.

Let me know if you experience the same issue.

gthomas-slack avatar Jun 22 '20 23:06 gthomas-slack

@bulam can you help here?

naoyak avatar Jun 23 '20 00:06 naoyak

hi @gthomas-slack I looked into this last week and this is the summary of what I found:

  • KR uses gunicorn for distributing processing across multiple workers. When using a SQL db backend the gunicorn workers try to access the db simultaneously. However, each attempt locks the other attempts out, typically blocking SQL db access altogether.
  • The same problem happens when the KR tries to autoindex with git pull. The gunicorn workers complain about multiple git processes running simultaneously .

To address these issues, I'm currently putting together a docker compose file with a SQL db backend that does the following (this is actually how we have it implemented internally at Airbnb as well):

  • Disable the KR built-in auto-indexing to avoid the gunicorn multi worker concurrency issues
  • Instead, use a cron job that runs git pull & knowledge_repo reindex on a regular schedule (e.g. every X mins)
  • Additionally, include a webhook to a git repo that triggers a reindex whenever a new PR is merged.

I have this working and just working on testing a bit more and cleaning up the code. Planning to share by end of week

bulam avatar Jun 23 '20 01:06 bulam

Thanks for investigating @bulam ! Let me know if/when you would like some help testing/validating the fixes, happy to try it out from my side.

gthomas-slack avatar Jun 23 '20 16:06 gthomas-slack

Hey @bulam , have you already shared your solution somewhere? I am currently using the KR with a mysql DB backend. I didn't manage to fix the indexing issues with the additions that @CPapadim provided, but rather by removing/commenting the short-circuit condition in knowledge_repo -> app -> index.py:

# Short-circuit if necessary
# if not force and (not is_index_master or index_up_to_date()):
    # return False

Maybe that's useful for somebody.

doepking avatar Oct 08 '20 13:10 doepking

@doepking sorry got busy and haven't gotten a chance to wrap up the image yet. Hoping to spend some time on this soon

bulam avatar Oct 14 '20 02:10 bulam

@bulam - Does Airbnb's deployed of KR use Gunicorn?

I have a suspicion that the root cause of these various problems is because the connection pool is being instantiated before the gunicorn processes are forked. This results in connections attempting to be shared across processes which causes a bunch of problems. https://davidcaron.dev/sqlalchemy-multiple-threads-and-processes/ I'm not super familiar with how the bin/knowledge_repo works but it seems like some changes are necessary there to ensure that the process fork happens before the database pool is stood up.

I think this is happening because I'm able to replicate @gthomas-slack 's issues locally, even without running the re-index job. Thus, the only thing running is the webserver, and yet I'm see still seeing errors like:

[2021-03-15 15:46:39 +0000] [98] [ERROR] Error handling request /static/modules/bootstrap-slider/js/bootstrap-slider.min.js
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 779, in _getter
    getter = self._metadata._getter
AttributeError: 'NoneType' object has no attribute '_getter'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1945, in full_dispatch_request
    self.try_trigger_before_first_request_functions()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions
    func()
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 202, in ensure_excluded_tags_exist
    Tag(name=tag)
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 45, in __new__
    return unique_cache(
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 19, in unique_cache
    obj = q.first()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 100, in instances
    cursor.close()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 58, in instances
    *[
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 59, in <listcomp>
    query_entity.row_processor(query, context, cursor)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4422, in row_processor
    _instance = loading._instance_processor(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 421, in _instance_processor
    getter = result._getter(col, False)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 781, in _getter
    return self._non_result(None, err)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1236, in _non_result
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 134, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1879, in handle_exception
    server_error = handler(server_error)
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 191, in show_traceback
    return render_template('error.html')
  File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 136, in render_template
    ctx.app.update_template_context(context)
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 838, in update_template_context
    context.update(func())
  File "/usr/local/lib/python3.8/site-packages/flask_login/utils.py", line 379, in _user_context_processor
    return dict(current_user=_get_user())
  File "/usr/local/lib/python3.8/site-packages/flask_login/utils.py", line 346, in _get_user
    current_app.login_manager._load_user()
  File "/usr/local/lib/python3.8/site-packages/flask_login/login_manager.py", line 331, in _load_user
    user = self._load_user_from_request(request)
  File "/usr/local/lib/python3.8/site-packages/flask_login/login_manager.py", line 390, in _load_user_from_request
    user = self._request_callback(request)
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/app.py", line 120, in load_user_from_request
    user = User(identifier=user_attributes['identifier'])
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 45, in __new__
    return unique_cache(
  File "/usr/local/lib/python3.8/site-packages/knowledge_repo/app/utils/models.py", line 19, in unique_cache
    obj = q.first()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2006, 'MySQL server has gone away')

This traceback leads me to believe that multiple processes are trying to use the same connection. I think the second condition in this issue applies

traviscook21 avatar Mar 15 '21 16:03 traviscook21

Leaving an update for future folks experiencing this:

We added to our server_config.py the following lines:

SQLALCHEMY_ENGINE_OPTIONS = {
    'poolclass': NullPool,
}

This forces SQLAlchemy to use a NullPool which implies "use a brand new connection every time you do anything with the database". We've stopped seeing the tracebacks posted above and getting failures to connect to the service.

As a note, there's an issue with Flask-SQLAlchemy that prevents NullPool from being set correctly. We've deployed using this fork of Flask-SQLAlchemy.

traviscook21 avatar Mar 17 '21 21:03 traviscook21