dify icon indicating copy to clipboard operation
dify copied to clipboard

Optimization Suggestions for Database Connection Release Issues

Open secbr opened this issue 1 year ago • 6 comments

Self Checks

  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

The current project uses the database framework flask_sqlalchemy.

flask_sqlalchemy binds a database connection for each HTTP request. In a high-concurrency production environment, scenarios involving streaming returns or prolonged LLM returns occupy database connections for an extended period. For instance, an LLM request occupying more than 10 seconds would mean the entire database connection is engaged for over 10 seconds.

Eventually, this leads to a continuous increase in the number of database connections under high concurrency, until it hits the limit and exceptions occur.

Modification suggestions: (1) Decouple the database connection from HTTP requests. Release the connection immediately after querying or modifying data within an HTTP request. Re-acquire the connection for database operations as needed and release it immediately after the operation, instead of making long-duration blocking LLM requests while holding onto the database connection. (2) One of the best practices in programming is also to avoid making HTTP requests within a database connection or transaction, especially for those that take a long time.

2. Additional context or comments

No response

3. Can you help us with this feature?

  • [ ] I am interested in contributing to this feature.

secbr avatar Jul 23 '24 11:07 secbr

the document here described how they use db.session, I think they already considered this high-concurrency scene.

hjlarry avatar Jul 24 '24 01:07 hjlarry

the document here described how they use db.session, I think they already considered this high-concurrency scene.

In theory, this is the case, but under high concurrency, there have already been issues of database connections being exhausted due to the prolonged response times of the LLM model.

secbr avatar Jul 24 '24 02:07 secbr

Additional information:

Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 489, in wrapper resp = resource(*args, **kwargs) File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 604, in dispatch_request resp = meth(*args, **kwargs) File "/app/api/controllers/console/setup.py", line 71, in decorated elif not get_setup_status(): File "/app/api/controllers/console/setup.py", line 81, in get_setup_status return DifySetup.query.first() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2728, in first return self.limit(1)._iter().first() # type: ignore File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2827, in _iter result: Union[ScalarResult[_T], Result[_T]] = self.session.execute( File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2351, in execute return self._execute_internal( File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2226, in _execute_internal conn = self._connection_for_bind(bind) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2095, in _connection_for_bind return trans._connection_for_bind(engine, execution_options) File "", line 2, in _connection_for_bind File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go ret_value = fn(self, *arg, **kw) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1189, in _connection_for_bind conn = bind.connect() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3276, in connect return self._connection_cls(self) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 148, in init Connection._handle_dbapi_exception_noconnection( File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2440, in _handle_dbapi_exception_noconnection raise sqlalchemy_exception.with_traceback(exc_info[2]) from e File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 146, in init self._dbapi_connection = engine.raw_connection() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3300, in raw_connection return self.pool.connect() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 449, in connect return _ConnectionFairy._checkout(self) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 717, in checkout with util.safe_reraise(): File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 715, in checkout dbapi_connection = rec.get_connection() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 866, in get_connection self.__connect() File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect with util.safe_reraise(): File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect self.dbapi_connection = connection = pool._invoke_creator(self) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 643, in connect return dialect.connect(*cargs, **cparams) File "/app/api/.venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 620, in connect return self.loaded_dbapi.connect(*cargs, **cparams) File "/app/api/.venv/lib/python3.10/site-packages/psycopg2/init.py", line 122, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "10.1.0.11", port 5432 failed: FATAL: sorry, too many clients already

(Background on this error at: https://sqlalche.me/e/20/e3q8) 2024-07-23 01:55:30,415.415 ERROR [Dummy-3590] [app.py:838] - Exception on /console/api/datasets [GET] Traceback (most recent call last): File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 489, in wrapper resp = resource(*args, **kwargs) File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 604, in dispatch_request resp = meth(*args, **kwargs) File "/app/api/controllers/console/setup.py", line 74, in decorated return view(*args, **kwargs) File "/app/api/libs/login.py", line 91, in decorated_view return current_app.ensure_sync(func)(*args, **kwargs) File "/app/api/controllers/console/wraps.py", line 21, in decorated return view(*args, **kwargs) File "/app/api/controllers/console/datasets/datasets.py", line 64, in get configurations = provider_manager.get_configurations( File "/app/api/core/provider_manager.py", line 125, in get_configurations custom_configuration = self._to_custom_configuration( File "/app/api/core/provider_manager.py", line 607, in _to_custom_configuration self.decoding_rsa_key, self.decoding_cipher_rsa = encrypter.get_decrypt_decoding(tenant_id) File "/app/api/core/helper/encrypter.py", line 29, in get_decrypt_decoding return rsa.get_decrypt_decoding(tenant_id) File "/app/api/libs/rsa.py", line 55, in get_decrypt_decoding private_key = storage.load(filepath) File "/app/api/extensions/ext_storage.py", line 55, in load return self.load_once(filename) File "/app/api/extensions/ext_storage.py", line 58, in load_once return self.storage_runner.load_once(filename) File "/app/api/extensions/storage/s3_storage.py", line 38, in load_once data = client.get_object(Bucket=self.bucket_name, Key=filename)['Body'].read() File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call return self._make_api_call(operation_name, kwargs) File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied. 2024-07-23 01:55:44,723.723 ERROR [Dummy-642] [completion.py:80] - internal server error. Traceback (most recent call last): File "/app/api/controllers/service_api/app/completion.py", line 53, in post response = AppGenerateService.generate( File "/app/api/services/app_generate_service.py", line 40, in generate return rate_limit.generate(CompletionAppGenerator().generate( File "/app/api/core/app/apps/completion/app_generator.py", line 105, in generate model_conf=ModelConfigConverter.convert(app_config), File "/app/api/core/app/app_config/easy_ui_based_app/model_config/converter.py", line 27, in convert provider_model_bundle = provider_manager.get_provider_model_bundle( File "/app/api/core/provider_manager.py", line 200, in get_provider_model_bundle provider_configurations = self.get_configurations(tenant_id) File "/app/api/core/provider_manager.py", line 125, in get_configurations custom_configuration = self._to_custom_configuration( File "/app/api/core/provider_manager.py", line 607, in _to_custom_configuration self.decoding_rsa_key, self.decoding_cipher_rsa = encrypter.get_decrypt_decoding(tenant_id) File "/app/api/core/helper/encrypter.py", line 29, in get_decrypt_decoding return rsa.get_decrypt_decoding(tenant_id) File "/app/api/libs/rsa.py", line 55, in get_decrypt_decoding private_key = storage.load(filepath) File "/app/api/extensions/ext_storage.py", line 55, in load return self.load_once(filename) File "/app/api/extensions/ext_storage.py", line 58, in load_once return self.storage_runner.load_once(filename) File "/app/api/extensions/storage/s3_storage.py", line 38, in load_once data = client.get_object(Bucket=self.bucket_name, Key=filename)['Body'].read() File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call return self._make_api_call(operation_name, kwargs) File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied. 2024-07-23 01:55:44,724.724 ERROR [Dummy-642] [app.py:838] - Exception on /v1/completion-messages [POST] Traceback (most recent call last): File "/app/api/controllers/service_api/app/completion.py", line 53, in post response = AppGenerateService.generate( File "/app/api/services/app_generate_service.py", line 40, in generate return rate_limit.generate(CompletionAppGenerator().generate( File "/app/api/core/app/apps/completion/app_generator.py", line 105, in generate model_conf=ModelConfigConverter.convert(app_config), File "/app/api/core/app/app_config/easy_ui_based_app/model_config/converter.py", line 27, in convert provider_model_bundle = provider_manager.get_provider_model_bundle( File "/app/api/core/provider_manager.py", line 200, in get_provider_model_bundle provider_configurations = self.get_configurations(tenant_id) File "/app/api/core/provider_manager.py", line 125, in get_configurations custom_configuration = self._to_custom_configuration( File "/app/api/core/provider_manager.py", line 607, in _to_custom_configuration self.decoding_rsa_key, self.decoding_cipher_rsa = encrypter.get_decrypt_decoding(tenant_id) File "/app/api/core/helper/encrypter.py", line 29, in get_decrypt_decoding return rsa.get_decrypt_decoding(tenant_id) File "/app/api/libs/rsa.py", line 55, in get_decrypt_decoding private_key = storage.load(filepath) File "/app/api/extensions/ext_storage.py", line 55, in load return self.load_once(filename) File "/app/api/extensions/ext_storage.py", line 58, in load_once return self.storage_runner.load_once(filename) File "/app/api/extensions/storage/s3_storage.py", line 38, in load_once data = client.get_object(Bucket=self.bucket_name, Key=filename)['Body'].read() File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call return self._make_api_call(operation_name, kwargs) File "/app/api/.venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 489, in wrapper resp = resource(*args, **kwargs) File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/init.py", line 604, in dispatch_request resp = meth(*args, **kwargs) File "/app/api/controllers/service_api/wraps.py", line 75, in decorated_view return view_func(*args, **kwargs) File "/app/api/controllers/service_api/app/completion.py", line 81, in post raise InternalServerError() werkzeug.exceptions.InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Some Thoughts

I think the problematic code might be in the following code snippet. Because the db.session.close() method is executed in the finally block, it means the close method is only executed after the entire program has finished running. This results in operations that involve long-duration LLM requests occurring between the creation and closing of the session.

def _generate_worker(self, flask_app: Flask,
                         application_generate_entity: CompletionAppGenerateEntity,
                         queue_manager: AppQueueManager,
                         message_id: str) -> None:
        """
        Generate worker in a new thread.
        :param flask_app: Flask app
        :param application_generate_entity: application generate entity
        :param queue_manager: queue manager
        :param message_id: message ID
        :return:
        """
        with flask_app.app_context():
            try:
                # get message
                message = self._get_message(message_id)

                # chatbot app
                runner = CompletionAppRunner()
                runner.run(
                    application_generate_entity=application_generate_entity,
                    queue_manager=queue_manager,
                    message=message
                )
            except GenerateTaskStoppedException:
                pass
            except InvokeAuthorizationError:
                queue_manager.publish_error(
                    InvokeAuthorizationError('Incorrect API key provided'),
                    PublishFrom.APPLICATION_MANAGER
                )
            except ValidationError as e:
                logger.exception("Validation Error when generating")
                queue_manager.publish_error(e, PublishFrom.APPLICATION_MANAGER)
            except (ValueError, InvokeError) as e:
                if os.environ.get("DEBUG") and os.environ.get("DEBUG").lower() == 'true':
                    logger.exception("Error when generating")
                queue_manager.publish_error(e, PublishFrom.APPLICATION_MANAGER)
            except Exception as e:
                logger.exception("Unknown Error when generating")
                queue_manager.publish_error(e, PublishFrom.APPLICATION_MANAGER)
            finally:
                db.session.close()

secbr avatar Jul 24 '24 02:07 secbr

although the finally block has db.session.close, the code here inside runner.run() before invoke_llm() also has a db.session.close

maybe the error is caused by PostgreSQL's max_connections config?

hjlarry avatar Jul 24 '24 02:07 hjlarry

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "10.1.0.11", port 5432 failed: FATAL: sorry, too many clients already. I have a complex workflow and I am the only user.

getting this error very frequently, is there any config to tweak?

TY

ruzzzz6312 avatar Aug 14 '24 00:08 ruzzzz6312

I also encountered this issue. By mapping the PostgreSQL database configuration file, modifying the maximum number of connections, expanding performance settings, and adjusting the connection pool size in the env file, the problem no longer occurs. However, I don't think this is the best solution.

ChiayenGu avatar Sep 05 '24 12:09 ChiayenGu

This is still very relevant. Facing it from time to time, causing downtime for accessing Dify.

mbbyn avatar Mar 03 '25 11:03 mbbyn

i have this also, as single user during a knowledge Stack creation embedding phase. Just creating two stacks at the same time. One with LM-Studio endpoint and one with Xinference endpoint. Nothing special. As i have a very beefy server i´m happy to change some sql settings if i would know which one....

mamema avatar Mar 22 '25 18:03 mamema

How to adjust the parameter values, whether to increase or decrease them, and specifically how to make the adjustments? pool_size and max_connections

LanYu4qz avatar May 21 '25 01:05 LanYu4qz