dify icon indicating copy to clipboard operation
dify copied to clipboard

db connection error when embedding task takes a long time

Open lordk911 opened this issue 2 months ago • 4 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When I uploaded a CSV file with approximately 24,000 rows and used newline characters as delimiters for segmentation, the embedding process took a long time and eventually failed with a database connection error. However, when I split the same file into 10 smaller files (each containing about 2,400 rows), the process completed successfully without any issues.

The error report is as follows:

2025-11-05 08:22:03,749.749 INFO [ThreadPoolExecutor-10_0] [vector_factory.py:194] - Embedding batch 3/3454 took 138.00466442108154 s
2025-11-05 08:22:04,497.497 INFO [ThreadPoolExecutor-10_0] [vector_factory.py:198] - Embedding 2455 texts took 2888.9533185958862 s
2025-11-05 08:22:04,888.888 ERROR [Dummy-33] [indexing_runner.py:97] - consume document failed
Traceback (most recent call last):
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2118, in _exec_insertmany_context
    dialect.do_execute(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 951, in do_execute
    cursor.execute(statement, parameters)
  File "/app/api/.venv/lib/python3.12/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback
    state = conn.poll()
            ^^^^^^^^^^^
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 81, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 571, in _load
    tokens += future.result()
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/indexing_runner.py", line 630, in _process_chunk
    index_processor.load(dataset, chunk_documents, with_keywords=False)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 82, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 193, in create
    batch_embeddings = self._embeddings.embed_documents([document.page_content for document in batch])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/rag/embedding/cached_embedding.py", line 103, in embed_documents
    raise ex
  File "/app/api/core/rag/embedding/cached_embedding.py", line 97, in embed_documents
    db.session.commit()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/scoping.py", line 599, in commit
    return self._proxied.commit()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2032, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1313, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1288, in _prepare_impl
    self.session.flush()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4345, in flush
    self._flush(objects)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4480, in _flush
    with util.safe_reraise():
         ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 224, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4441, in _flush
    flush_context.execute()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1143, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1419, in execute
    return meth(
           ^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 526, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1641, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_insertmany_context(dialect, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2126, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2118, in _exec_insertmany_context
    dialect.do_execute(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 951, in do_execute
    cursor.execute(statement, parameters)
  File "/app/api/.venv/lib/python3.12/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback
    state = conn.poll()
            ^^^^^^^^^^^
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[SQL: INSERT INTO embeddings (model_name, hash, embedding, provider_name) VALUES (%(model_name)s, %(hash)s, %(embedding)s, %(provider_name)s) RETURNING embeddings.id, embeddings.created_at]
[parameters: {'model_name': 'bge-m3', 'hash': '33814e1d011f86c8b67a7300178ec27291e3121a6db01c860c64f1ba3f32cbe3', 'embedding': <psycopg2.extensions.Binary object at 0x7fc12d4be0a0>, 'provider_name': 'langgenius/xinference/xinference'}]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
2025-11-05 08:22:04,895.895 ERROR [Dummy-33] [retry_document_indexing_task.py:120] - retry_document_indexing_task failed, dataset_id: 377f0818-fba6-486e-b7ec-382b339aee9b, document_ids: ['d8537b06-a879-460e-82df-fc19689765d2']
Traceback (most recent call last):
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2118, in _exec_insertmany_context
    dialect.do_execute(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 951, in do_execute
    cursor.execute(statement, parameters)
  File "/app/api/.venv/lib/python3.12/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback
    state = conn.poll()
            ^^^^^^^^^^^
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 81, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 571, in _load
    tokens += future.result()
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/indexing_runner.py", line 630, in _process_chunk
    index_processor.load(dataset, chunk_documents, with_keywords=False)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 82, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 193, in create
    batch_embeddings = self._embeddings.embed_documents([document.page_content for document in batch])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/rag/embedding/cached_embedding.py", line 103, in embed_documents
    raise ex
  File "/app/api/core/rag/embedding/cached_embedding.py", line 97, in embed_documents
    db.session.commit()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/scoping.py", line 599, in commit
    return self._proxied.commit()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2032, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1313, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1288, in _prepare_impl
    self.session.flush()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4345, in flush
    self._flush(objects)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4480, in _flush
    with util.safe_reraise():
         ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 224, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4441, in _flush
    flush_context.execute()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1143, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1419, in execute
    return meth(
           ^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 526, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1641, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_insertmany_context(dialect, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2126, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2118, in _exec_insertmany_context
    dialect.do_execute(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 951, in do_execute
    cursor.execute(statement, parameters)
  File "/app/api/.venv/lib/python3.12/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback
    state = conn.poll()
            ^^^^^^^^^^^

✔️ Expected Behavior

The word embedding can be completed normally

❌ Actual Behavior

Embedding processing...

lordk911 avatar Nov 05 '25 08:11 lordk911

the pr: https://github.com/langgenius/dify/pull/26196 have been release with 1.9.1 , but still have this issue. I found that these changes no longer exist in the main branch.

lordk911 avatar Nov 05 '25 08:11 lordk911

You're correct—the changes from PR #26196 (explicitly closing the DB session before embedding to prevent timeouts) are not present in the main branch or in the 1.9.1 release, despite the PR being merged and referenced in release notes. This likely means the fix was reverted, overwritten, or not properly merged into the release branch, which matches your observation and explains why the issue persists in 1.9.1. The embedding pipeline still risks holding a long-lived DB connection during lengthy embedding tasks, leading to the PostgreSQL disconnects you’re seeing (source).

For a confirmed fix, update to Dify 1.9.2 or later, where the session management patch is included and users have reported successful resolution of this exact problem (source). If you can't upgrade, you can manually patch your deployment by ensuring the DB session is closed before starting long embedding operations, as described in PR #26196.

Let me know if you need a code snippet or more details on how to apply the workaround.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 05 '25 09:11 dosubot[bot]

Moreover, I've noticed that after this issue occurs, the number of database connections from Dify nearly doubles.

lordk911 avatar Nov 05 '25 09:11 lordk911

This doubling of database connections after an embedding failure is likely due to a connection leak in the current Dify codebase. When a long-running embedding task fails (for example, due to a timeout or database error), the code only rolls back the session but does not explicitly close or remove it. Because Dify uses a shared scoped session, failed or orphaned connections can accumulate, especially after errors in threaded or concurrent embedding tasks. This leaves connections open in the pool, which can quickly double the number of active connections after repeated failures, matching what you’re seeing (source).

A previous fix that closed the session before embedding was reverted to avoid breaking downstream code, but this also removed the safeguard against leaks (source). Until session cleanup is handled more robustly, this leak will persist after embedding errors. As a workaround, you can manually patch your deployment to ensure sessions are closed or removed after failures, or consider upgrading if a future release addresses this with better session management.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 05 '25 09:11 dosubot[bot]