dify icon indicating copy to clipboard operation
dify copied to clipboard

After upgrading from v0.5.3 to v1.0.0, the knowledge base cannot parse and vectorize documents, and an error is reported directly.

Open d960124 opened this issue 9 months ago • 5 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

v1.0.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

After upgrading from v0.5.3 to v1.0.0, the knowledge base cannot parse and vectorize documents, and an error is reported directly. Upgrade process: first back up the volumes directory, then download the new project files from github, update the three image files langgenius/dify-web:1.0.0 docker-api-1 langgenius/dify-plugin-daemon:0.0.3-local, and then docker-compose down docker-compose up -d

Then execute in sequence: poetry run flask extract-plugins --workers=20 poetry run flask db upgrade poetry run flask migrate-data-for-plugin A total of 2 servers were upgraded, of which server A could not download the plug-in, and the installation package was manually downloaded from the official website and uploaded and downloaded locally Server B can be installed directly online. The current problem is that the knowledge base of server A cannot parse the newly added documents, while server B is normal In addition, there is a problem with both servers, that is, after the documents in the knowledge base are archived, they can no longer be revoked, and revocation reports an error

Image

Image

The following is the log of the docker-worker-1 container of server A: 2025-03-04 13:32:53.391 INFO [MainThread] [connection.py:22] - Connected to redis://:@redis:6379/1 2025-03-04 13:32:53.394 INFO [MainThread] [mingle.py:40] - mingle: searching for neighbors 2025-03-04 13:32:54.401 INFO [MainThread] [mingle.py:49] - mingle: all alone 2025-03-04 13:32:54.412 INFO [MainThread] [worker.py:175] - celery@82791388ba25 ready. 2025-03-04 13:32:54.414 INFO [Dummy-1] [pidbox.py:111] - pidbox: Connected to redis://:@redis:6379/1. 2025-03-04 13:35:40.862 INFO [MainThread] [strategy.py:161] - Task tasks.document_indexing_task.document_indexing_task[96b7176c-40d8-41ef-8cab-e65f6ae466f3] received 2025-03-04 13:35:40.898 INFO [Dummy-2] [document_indexing_task.py:59] - Start process document: 38cc0d19-97e6-431c-a615-76beeaa15b07 2025-03-04 13:35:40.903 INFO [Dummy-2] [document_indexing_task.py:59] - Start process document: 5025fdf0-98f0-4665-b1ce-2f8887521495 2025-03-04 13:39:41.024 ERROR [Dummy-2] [indexing_runner.py:96] - consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 73, in run documents = self._transform(

File "/app/api/core/indexing_runner.py", line 706, in _transform documents = index_processor.transform(

File "/app/api/core/rag/index_processor/processor/parent_child_index_processor.py", line 56, in transform document_nodes = splitter.split_documents([document])

File "/app/api/core/rag/splitter/text_splitter.py", line 96, in split_documents return self.create_documents(texts, metadatas=metadatas)

File "/app/api/core/rag/splitter/text_splitter.py", line 81, in create_documents for chunk in self.split_text(text):

File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 68, in split_text chunks_lengths = self._length_function(chunks)

File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 38, in _token_encoder return embedding_model_instance.get_text_embedding_num_tokens(texts=texts)

File "/app/api/core/model_manager.py", line 244, in get_text_embedding_num_tokens self._round_robin_invoke( File "/app/api/core/model_manager.py", line 370, in _round_robin_invoke return function(*args, **kwargs)

File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 65, in get_num_tokens return plugin_model_manager.get_text_embedding_num_tokens(

File "/app/api/core/plugin/manager/model.py", line 313, in get_text_embedding_num_tokens for resp in response:

File "/app/api/core/plugin/manager/base.py", line 189, in _request_with_plugin_daemon_response_stream self._handle_plugin_daemon_error(error.error_type, error.message) File "/app/api/core/plugin/manager/base.py", line 223, in _handle_plugin_daemon_error raise PluginDaemonInternalServerError(description=message) core.plugin.manager.exc.PluginDaemonInternalServerError: PluginDaemonInternalServerError: killed by timeout 2025-03-04 13:39:41.062 WARNING [Dummy-2] [warnings.py:112] - /app/api/.venv/lib/python3.12/site-packages/pypdfium2/_helpers/textpage.py:80: UserWarning: get_text_range() call with default params will be implicitly redirected to get_text_bounded() warnings.warn("get_text_range() call with default params will be implicitly redirected to get_text_bounded()")

2025-03-04 13:43:41.368 ERROR [Dummy-2] [indexing_runner.py:96] - consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 73, in run documents = self._transform(

File "/app/api/core/indexing_runner.py", line 706, in _transform documents = index_processor.transform(

File "/app/api/core/rag/index_processor/processor/parent_child_index_processor.py", line 56, in transform document_nodes = splitter.split_documents([document])

File "/app/api/core/rag/splitter/text_splitter.py", line 96, in split_documents return self.create_documents(texts, metadatas=metadatas)

File "/app/api/core/rag/splitter/text_splitter.py", line 81, in create_documents for chunk in self.split_text(text):

File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 68, in split_text chunks_lengths = self._length_function(chunks)

File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 38, in _token_encoder return embedding_model_instance.get_text_embedding_num_tokens(texts=texts)

File "/app/api/core/model_manager.py", line 244, in get_text_embedding_num_tokens self._round_robin_invoke( File "/app/api/core/model_manager.py", line 370, in _round_robin_invoke return function(*args, **kwargs)

File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 65, in get_num_tokens return plugin_model_manager.get_text_embedding_num_tokens(

File "/app/api/core/plugin/manager/model.py", line 313, in get_text_embedding_num_tokens for resp in response:

File "/app/api/core/plugin/manager/base.py", line 189, in _request_with_plugin_daemon_response_stream self._handle_plugin_daemon_error(error.error_type, error.message) File "/app/api/core/plugin/manager/base.py", line 223, in _handle_plugin_daemon_error raise PluginDaemonInternalServerError(description=message) core.plugin.manager.exc.PluginDaemonInternalServerError: PluginDaemonInternalServerError: killed by timeout 2025-03-04 13:43:41.375 INFO [Dummy-2] [document_indexing_task.py:76] - Processed dataset: d7eaadb1-039a-4c99-b3fb-131c03caa9b1 latency: 480.5118366936222 2025-03-04 13:43:41.426 INFO [Dummy-2] [trace.py:128] - Task tasks.document_indexing_task.document_indexing_task[96b7176c-40d8-41ef-8cab-e65f6ae466f3] succeeded in 480.56343429908156s: None

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

d960124 avatar Mar 04 '25 14:03 d960124

To resolve the "PluginDaemonInternalServerError: killed by timeout" error after upgrading from Dify v0.5.3 to v1.0.0, you need to increase the plugin timeout setting. You can do this by setting the PluginMaxExecutionTimeout parameter in your .env file to a higher value, specified in seconds. For example, to allow for 10 minutes of execution time, set it to 600 seconds [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 04 '25 14:03 dosubot[bot]

PluginMaxExecutionTimeout parameter not found in .env file,Do I need to add one? @Dosu

d960124 avatar Mar 04 '25 14:03 d960124

I also encountered the same problem

ccooder avatar Mar 05 '25 01:03 ccooder

core.plugin.manager.exc.PluginDaemonInternalServerError: PluginDaemonInternalServerError: killed by timeout

It is indeed the execution timeout. I can process a small file successfully, but it fails with a large file.

The error occurs during the document conversion process, specifically when trying to calculate the number of tokens embedded in the text: There is no relevant example parameter in the .env file. I don’t know if I need to manually add a PluginMaxExecutionTimeout parameter! @dosu

d960124 avatar Mar 05 '25 01:03 d960124

Yes, you need to manually add the PluginMaxExecutionTimeout parameter to the .env file to resolve the "PluginDaemonInternalServerError: killed by timeout" issue during document conversion. Set the value in seconds, for example, 600 for 10 minutes [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 05 '25 01:03 dosubot[bot]

We fixed in https://github.com/langgenius/dify-plugin-daemon/pull/59

  plugin_daemon:
    image: langgenius/dify-plugin-daemon:e0672c3c1a6451437e8f4b63b260c8b0863c9c80-local

crazywoola avatar Mar 06 '25 07:03 crazywoola

We fixed in langgenius/dify-plugin-daemon#59

  plugin_daemon:
    image: langgenius/dify-plugin-daemon:e0672c3c1a6451437e8f4b63b260c8b0863c9c80-local

i use diy-1.0.0. after I "docker pull langgenius/dify-plugin-daemon:e0672c3c1a6451437e8f4b63b260c8b0863c9c80-local", i can enter web page, but I can not enter plugin in web.

Image

fengxin215 avatar Mar 07 '25 11:03 fengxin215