dify icon indicating copy to clipboard operation
dify copied to clipboard

Error reported when adding text or CSV files to the knowledge base in version 1.0.0

Open mryang513662932 opened this issue 10 months ago • 9 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.0.0-bate.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

After uploading the file, clicking the preview block button will result in an error

{ "code": "indexing_estimate_error", "message": "PluginInvokeError: {"error_type":"unmarshal_error","message":"unmarshal json failed: json: cannot unmarshal number into Go struct field GetTextEmbeddingNumTokensResponse.num_tokens of type []int"}", "status": 500 }

error.log

✔️ Expected Behavior

Normal preview block

❌ Actual Behavior

API error

mryang513662932 avatar Feb 13 '25 05:02 mryang513662932

@mryang513662932 Hello there! I'm here to help you with bugs, questions, and becoming a contributor.

The error you're encountering is due to a mismatch in the expected data type for the num_tokens field in the GetTextEmbeddingNumTokensResponse struct. The error message indicates that the system is trying to unmarshal a number into a field that is expected to be of type []int (a slice of integers) in Go.

To resolve this issue, you should check the following:

  1. Data Type Consistency: Ensure that the API response is providing the num_tokens field as a slice of integers ([]int) rather than a single number. If the API is returning a single number, you may need to adjust the struct definition to match the actual data type being returned.

  2. Struct Definition: Verify the struct definition in your Go code to ensure it matches the expected structure of the API response. If the API response format cannot be changed, you may need to update the struct to correctly handle the data type being returned.

  3. API Response: If you have control over the API, ensure that it returns the num_tokens field in the correct format that matches the Go struct definition.

By aligning the data types between the API response and the Go struct, you should be able to resolve the unmarshalling error.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report

dosubot[bot] avatar Feb 13 '25 05:02 dosubot[bot]

Have you solved your problem? I have the same problem

yaomindong1996 avatar Feb 28 '25 01:02 yaomindong1996

Have you solved your problem? I have the same problem

Not yet, waiting for the official fix.

mryang513662932 avatar Feb 28 '25 01:02 mryang513662932

Same issue here but with a pdf file. Happens after V1.0.0 Update

daniel-kress avatar Mar 04 '25 16:03 daniel-kress

I some more logs about the issue. I've deployed v1.0.0 via docker compose.

The logs there show the follwoing errors when I try to embed an PDF:

api-1 | 2025-03-05 09:51:08.103 ERROR [Dummy-1] [app.py:875] - Exception on /console/api/datasets/indexing-estimate [POST] api-1 | Traceback (most recent call last): api-1 | File "/app/api/controllers/console/datasets/datasets.py", line 452, in post api-1 | response = indexing_runner.indexing_estimate( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/indexing_runner.py", line 290, in indexing_estimate api-1 | documents = index_processor.transform( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 58, in transform api-1 | document_nodes = splitter.split_documents([document]) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/text_splitter.py", line 96, in split_documents api-1 | return self.create_documents(texts, metadatas=metadatas) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/text_splitter.py", line 81, in create_documents api-1 | for chunk in self.split_text(text): api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 68, in split_text api-1 | chunks_lengths = self._length_function(chunks) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 38, in _token_encoder api-1 | return embedding_model_instance.get_text_embedding_num_tokens(texts=texts) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/model_manager.py", line 244, in get_text_embedding_num_tokens api-1 | self._round_robin_invoke( api-1 | File "/app/api/core/model_manager.py", line 370, in _round_robin_invoke api-1 | return function(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 65, in get_num_tokens api-1 | return plugin_model_manager.get_text_embedding_num_tokens( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/plugin/manager/model.py", line 313, in get_text_embedding_num_tokens api-1 | for resp in response: api-1 | ^^^^^^^^ api-1 | File "/app/api/core/plugin/manager/base.py", line 189, in _request_with_plugin_daemon_response_stream api-1 | self._handle_plugin_daemon_error(error.error_type, error.message) api-1 | File "/app/api/core/plugin/manager/base.py", line 221, in _handle_plugin_daemon_error api-1 | raise PluginInvokeError(description=message) api-1 | core.plugin.manager.exc.PluginInvokeError: PluginInvokeError: {"error_type":"unmarshal_error","message":"unmarshal json failed: json: cannot unmarshal number into Go struct field GetTextEmbeddingNumTokensResponse.num_tokens of type []int"} api-1 | api-1 | During handling of the above exception, another exception occurred: api-1 | api-1 | Traceback (most recent call last): api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/app.py", line 917, in full_dispatch_request api-1 | rv = self.dispatch_request() api-1 | ^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/app.py", line 902, in dispatch_request api-1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask_restful/init.py", line 489, in wrapper api-1 | resp = resource(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/views.py", line 110, in view api-1 | return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask_restful/init.py", line 604, in dispatch_request api-1 | resp = meth(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/wraps.py", line 147, in decorated api-1 | return view(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/libs/login.py", line 94, in decorated_view api-1 | return current_app.ensure_sync(func)(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/wraps.py", line 27, in decorated api-1 | return view(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/datasets/datasets.py", line 468, in post api-1 | raise IndexingEstimateError(str(e)) api-1 | controllers.console.datasets.error.IndexingEstimateError: 500 Internal Server Error: PluginInvokeError: {"error_type":"unmarshal_error","message":"unmarshal json failed: json: cannot unmarshal number into Go struct field GetTextEmbeddingNumTokensResponse.num_tokens of type []int"}

Seems it is originated in the "plugin_model_manager" class of the "API" container.

daniel-kress avatar Mar 05 '25 09:03 daniel-kress

I some more logs about the issue. I've deployed v1.0.0 via docker compose.

The logs there show the follwoing errors when I try to embed an PDF:

api-1 | 2025-03-05 09:51:08.103 ERROR [Dummy-1] [app.py:875] - Exception on /console/api/datasets/indexing-estimate [POST] api-1 | Traceback (most recent call last): api-1 | File "/app/api/controllers/console/datasets/datasets.py", line 452, in post api-1 | response = indexing_runner.indexing_estimate( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/indexing_runner.py", line 290, in indexing_estimate api-1 | documents = index_processor.transform( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 58, in transform api-1 | document_nodes = splitter.split_documents([document]) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/text_splitter.py", line 96, in split_documents api-1 | return self.create_documents(texts, metadatas=metadatas) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/text_splitter.py", line 81, in create_documents api-1 | for chunk in self.split_text(text): api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 68, in split_text api-1 | chunks_lengths = self._length_function(chunks) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/rag/splitter/fixed_text_splitter.py", line 38, in _token_encoder api-1 | return embedding_model_instance.get_text_embedding_num_tokens(texts=texts) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/model_manager.py", line 244, in get_text_embedding_num_tokens api-1 | self._round_robin_invoke( api-1 | File "/app/api/core/model_manager.py", line 370, in _round_robin_invoke api-1 | return function(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 65, in get_num_tokens api-1 | return plugin_model_manager.get_text_embedding_num_tokens( api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/core/plugin/manager/model.py", line 313, in get_text_embedding_num_tokens api-1 | for resp in response: api-1 | ^^^^^^^^ api-1 | File "/app/api/core/plugin/manager/base.py", line 189, in _request_with_plugin_daemon_response_stream api-1 | self._handle_plugin_daemon_error(error.error_type, error.message) api-1 | File "/app/api/core/plugin/manager/base.py", line 221, in _handle_plugin_daemon_error api-1 | raise PluginInvokeError(description=message) api-1 | core.plugin.manager.exc.PluginInvokeError: PluginInvokeError: {"error_type":"unmarshal_error","message":"unmarshal json failed: json: cannot unmarshal number into Go struct field GetTextEmbeddingNumTokensResponse.num_tokens of type []int"} api-1 | api-1 | During handling of the above exception, another exception occurred: api-1 | api-1 | Traceback (most recent call last): api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/app.py", line 917, in full_dispatch_request api-1 | rv = self.dispatch_request() api-1 | ^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/app.py", line 902, in dispatch_request api-1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask_restful/init.py", line 489, in wrapper api-1 | resp = resource(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask/views.py", line 110, in view api-1 | return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/.venv/lib/python3.12/site-packages/flask_restful/init.py", line 604, in dispatch_request api-1 | resp = meth(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/wraps.py", line 147, in decorated api-1 | return view(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/libs/login.py", line 94, in decorated_view api-1 | return current_app.ensure_sync(func)(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/wraps.py", line 27, in decorated api-1 | return view(*args, **kwargs) api-1 | ^^^^^^^^^^^^^^^^^^^^^ api-1 | File "/app/api/controllers/console/datasets/datasets.py", line 468, in post api-1 | raise IndexingEstimateError(str(e)) api-1 | controllers.console.datasets.error.IndexingEstimateError: 500 Internal Server Error: PluginInvokeError: {"error_type":"unmarshal_error","message":"unmarshal json failed: json: cannot unmarshal number into Go struct field GetTextEmbeddingNumTokensResponse.num_tokens of type []int"} nginx-1 | 217.244.7.16 - - [05/Mar/2025:09:51:08 +0000] "POST /console/api/datasets/indexing-estimate HTTP/1.1" 500 270 "https://wm-chat-service-server.westeurope.cloudapp.azure.com/datasets/create" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36 Edg/133.0.0.0" "-"

Seems it is originated in the "plugin_model_manager" class of the "API" container.

You can try clearing all local containers, clearing mounted data, making sure to delete the local mounted files, and then redeploy 1.0.0 to use it I'm not sure if it can all be solved this way, remember to back up your data

mryang513662932 avatar Mar 05 '25 10:03 mryang513662932

when using the Docker image dify-plugin-daemon:0.0.3-local I also encountered this problem,but there is no problem anymore if u compiling and running it locally with the latest code.

hieheihei avatar Mar 09 '25 12:03 hieheihei

when using the Docker image dify-plugin-daemon:0.0.3-local I also encountered this problem,but there is no problem anymore if u compiling and running it locally with the latest code.

Thanks for the hint! Should I deploy only the dify-plugin-daemon:0.0.3 service natively should I compile and run everything directly with poetry?

Further, do you know if that will be fixed in the next update? Is it on the roadmap in any way?

daniel-kress avatar Mar 10 '25 12:03 daniel-kress

the same error, hope for the fix

Yuxiang1995 avatar Mar 11 '25 07:03 Yuxiang1995

I would like to know the plugin daemon version and which embedding model are you using?

Currently, we use image: langgenius/dify-plugin-daemon:0.0.4-local

crazywoola avatar Mar 18 '25 05:03 crazywoola