dify
dify copied to clipboard
An error message is displayed after the knowledge base file is uploaded
Self Checks
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.5.8
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
- Create a knowledge base
- Upload the md file and set all policies to default
- Wait for a moment. The Error state of the text Embedding processing is displayed
Error message screenshot:
docker-worker-1 container error logs
[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
return self._invoke(model, credentials, texts, user)
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
embeddings_batch, embedding_used_tokens = self._embedding_invoke(
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
s = _bytes_from_decode_data(s)
File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/api/core/indexing_runner.py", line 70, in run
self._load(
File "/app/api/core/indexing_runner.py", line 667, in _load
index_processor.load(dataset, chunk_documents)
File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
vector.create(documents)
File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
raise ex
File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
embedding_result = self._model_instance.invoke_text_embedding(
File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
return self.model_type_instance.invoke(
File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'
✔️ Expected Behavior
The knowledge base can be used normally
❌ Actual Behavior
The uploaded document cannot be processed properly
Hey there @joginwen! Great to see you around again. Hope you're doing well!
Thank you for providing detailed information about the issue you're experiencing. Based on the error logs you've shared, it seems like the error is occurring during the text embedding process. Specifically, the error message TypeError: argument should be a bytes-like object or ASCII string, not 'list'
suggests that the function base64.b64decode()
is receiving a list as an argument, whereas it expects a bytes-like object or ASCII string.
This error might be due to the format or content of the uploaded .md file. It's possible that the text embedding function is encountering a list where it expects a string, causing the TypeError.
As a first step, I would recommend checking the content of the .md file you're trying to upload. Make sure that it doesn't contain any unusual formatting or data types that might be interpreted as a list during processing.
If the issue persists, you might want to try the solutions suggested in similar issues:
- Archiving and unarchiving your documents as suggested in this issue.
- Changing the volumes path as suggested in this issue.
Please try these steps and let us know if the issue still persists. We're here to help!
Sources
- Embedding模型不生效
- The document has been processed. Please refresh the page or go to the document details
- 导入文本报错
- api/controllers/service_api/app/file.py
- web/app/components/base/icons/src/public/files/Csv.json
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
PR: https://github.com/langgenius/dify/pull/2781 , it has fixed, thanks for your feedback @joginwen
PR: #2781 , it has fixed, thanks for your feedback @joginwen
After updating the code, I tested it again, and the same error occurred
[2024-03-11 12:31:59,277: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
return self._invoke(model, credentials, texts, user)
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
embeddings_batch, embedding_used_tokens = self._embedding_invoke(
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
s = _bytes_from_decode_data(s)
File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/api/core/indexing_runner.py", line 71, in run
self._load(
File "/app/api/core/indexing_runner.py", line 674, in _load
index_processor.load(dataset, chunk_documents)
File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
vector.create(documents)
File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
raise ex
File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
embedding_result = self._model_instance.invoke_text_embedding(
File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
return self.model_type_instance.invoke(
File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'
[2024-03-11 12:31:59,281: INFO/MainProcess] Processed dataset: b41d31bf-1279-46a9-8ad4-b37706b1f2a2 latency: 2.85326619999978
Have you restarted the celery service? @joginwen
Have you restarted the celery service? @joginwen
My deployment is based on the docker compose container. After I update the main branch code, I mount the source directory to the inner container /app/api directory, replacing the inner container code
docker-compose fragment
version: '3.1'
services:
# API service
api:
volumes:
# Mount the storage directory to the container, for storing user files.
- ./volumes/app/storage:/app/api/storage
- /Users/liuda/Documents/work/ai/dify/api:/app/api
worker:
volumes:
# Mount the storage directory to the container, for storing user files.
- ./volumes/app/storage:/app/api/storage
- /Users/liuda/Documents/work/ai/dify/api:/app/api
docker-compose down Then docker-compose up -d completes the restart and after testing, errors still appear.
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.5.8
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
- Create a knowledge base
- Upload the md file and set all policies to default
- Wait for a moment. The Error state of the text Embedding processing is displayed
Error message screenshot:
docker-worker-1 container error logs
[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed Traceback (most recent call last): File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke return self._invoke(model, credentials, texts, user) File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke embeddings_batch, embedding_used_tokens = self._embedding_invoke( File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data], File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp> return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data], File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode s = _bytes_from_decode_data(s) File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data raise TypeError("argument should be a bytes-like object or ASCII " TypeError: argument should be a bytes-like object or ASCII string, not 'list' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 70, in run self._load( File "/app/api/core/indexing_runner.py", line 667, in _load index_processor.load(dataset, chunk_documents) File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load vector.create(documents) File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create embeddings = self._embeddings.embed_documents([document.page_content for document in texts]) File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents raise ex File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents embedding_result = self._model_instance.invoke_text_embedding( File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding return self.model_type_instance.invoke( File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke raise self._transform_invoke_error(e) core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'
✔️ Expected Behavior
The knowledge base can be used normally
❌ Actual Behavior
The uploaded document cannot be processed properly
May I ask if you have resolved it?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.5.8
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
- Create a knowledge base
- Upload the md file and set all policies to default
- Wait for a moment. The Error state of the text Embedding processing is displayed
Error message screenshot:
docker-worker-1 container error logs
[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed Traceback (most recent call last): File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke return self._invoke(model, credentials, texts, user) File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke embeddings_batch, embedding_used_tokens = self._embedding_invoke( File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data], File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp> return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data], File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode s = _bytes_from_decode_data(s) File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data raise TypeError("argument should be a bytes-like object or ASCII " TypeError: argument should be a bytes-like object or ASCII string, not 'list' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 70, in run self._load( File "/app/api/core/indexing_runner.py", line 667, in _load index_processor.load(dataset, chunk_documents) File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load vector.create(documents) File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create embeddings = self._embeddings.embed_documents([document.page_content for document in texts]) File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents raise ex File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents embedding_result = self._model_instance.invoke_text_embedding( File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding return self.model_type_instance.invoke( File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke raise self._transform_invoke_error(e) core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'
✔️ Expected Behavior
The knowledge base can be used normally
❌ Actual Behavior
The uploaded document cannot be processed properly
May I ask if you have resolved it?
No, I upgraded to version 0.5.9 and still got the same error.
i have the same error and dont know how to fix it
I have encountered the same issue as well. I reproduced it using the following code:
I discovered that the issue was due to my proxy not sending the "encoding_format": "base64" parameter to OpenAI's /v1/embeddings endpoint. Below is the response I received after including the encoding_format parameter, which can be correctly parsed.
import base64
import numpy as np
req = {
"data": [
{
"embedding": "",
"index": 0,
"object": "embedding"
}
],
"model": "text-embedding-ada-002",
"object": "list",
"usage": {
"prompt_tokens": 421,
"total_tokens": 421
}
}
embedding_data = req['data'][0]['embedding']
# Decode the base64 string and convert it to a NumPy array of float32 type
decoded_array = np.frombuffer(base64.b64decode(embedding_data), dtype=np.float32)
print(decoded_array)
data = req['data'][0]
print(np.frombuffer(base64.b64decode(data['embedding']), dtype="float32"))
You can modify the line extra_model_kwargs['encoding_format'] = 'base64'
in the file api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py
by commenting it out.
This is the result of my testing, I am not sure if you are encountering the same issue.
Upgrade to 0.5.9, the following error is reported when the knowledge base upload document
[2024-03-21 15:22:38,510: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[f2551a43-5607-4269-a61b-e1360c4c92f8] received
[2024-03-21 15:22:38,517: INFO/MainProcess] Start process document: 950e9171-108d-481a-9eee-da396574a68f
[2024-03-21 15:22:39,054: DEBUG/MainProcess] Created new connection using: 3075a88a36494dc3afd4685ad08b42fe
[2024-03-21 15:22:39,656: ERROR/MainProcess] RPC error: [insert_rows], <DataNotMatchException: (code=1, message=Attempt to insert an unexpected field to collection without enabling dynamic field)>, <Time:{'RPC start': '2024-03-21 15:22:39.653886', 'RPC error': '2024-03-21 15:22:39.656706'}>
[2024-03-21 15:22:39,657: ERROR/MainProcess] Failed to insert batch starting at entity: 0/11
[2024-03-21 15:22:39,657: ERROR/MainProcess] Failed to insert batch starting at entity: 0/11
[2024-03-21 15:22:39,657: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
File "/app/api/core/indexing_runner.py", line 70, in run
self._load(
File "/app/api/core/indexing_runner.py", line 667, in _load
index_processor.load(dataset, chunk_documents)
File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
vector.create(documents)
File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 121, in create
self._vector_processor.create(
File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 75, in create
self.add_texts(texts, embeddings)
File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 101, in add_texts
raise e
File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 95, in add_texts
ids = self._client.insert(collection_name=self._collection_name, data=batch_insert_list)
File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 206, in insert
raise ex from ex
File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 198, in insert
res = conn.insert_rows(collection_name, insert_batch, timeout=timeout)
File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
raise e from e
File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
raise e from e
File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 501, in insert_rows
request = self._prepare_row_insert_request(
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 482, in _prepare_row_insert_request
return Prepare.row_insert_param(
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 422, in row_insert_param
return cls._parse_row_request(request, fields_info, enable_dynamic, entities)
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 370, in _parse_row_request
raise DataNotMatchException(message=ExceptionsMessage.InsertUnexpectedField)
pymilvus.exceptions.DataNotMatchException: <DataNotMatchException: (code=1, message=Attempt to insert an unexpected field to collection without enabling dynamic field)>
[2024-03-21 15:22:39,663: INFO/MainProcess] Processed dataset: bd66a1d2-d871-42c4-8fe7-4275be32a591 latency: 1.1507785804569721