Error reported when uploading knowledge base file
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.3.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Error reported when uploading knowledge base file。error msg: <ServerinternalError: (code=15112, message=filter index field type mismatch. Field name: metadata, input type: json, expect type: string)
Error stack in worker pod:
2025-05-29 13:45:39,144.144 ERROR [Dummy-18] [indexing_runner.py:96] - consume document failed
Traceback (most recent call last):
File "/app/api/core/indexing_runner.py", line 80, in run
self._load(
File "/app/api/core/indexing_runner.py", line 570, in _load
tokens += future.result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/api/core/indexing_runner.py", line 625, in _process_chunk
index_processor.load(dataset, chunk_documents, with_keywords=False)
File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 78, in load
vector.create(documents)
File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 173, in create
self._vector_processor.create(texts=texts, embeddings=embeddings, **kwargs)
File "/app/api/core/rag/datasource/vdb/tencent/tencent_vector.py", line 167, in create
self.add_texts(texts, embeddings)
File "/app/api/core/rag/datasource/vdb/tencent/tencent_vector.py", line 192, in add_texts
self._client.upsert(
File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/stub.py", line 432, in upsert
return self.vdb_client.upsert(
^^^^^^^^^^^^^^^^^^^^^^^
File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/vdbclient.py", line 55, in upsert
result: olama_pb2.UpsertResponse = self.rpc_client.upsert(request, timeout=timeout, ai=ai)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 68, in upsert
raise se
File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 65, in upsert
self._result_check(ret, ret.warning)
File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 556, in _result_check
raise ServerInternalError(code=code, message=msg)
tcvectordb.exceptions.ServerInternalError: <ServerInternalError: (code=15112, message=filter index field type mismatch. Field name: metadata, input type: json, expect type: string)>
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Please provide full error logs in the container.
Please provide full error logs in the container.
Updated error stack @crazywoola
I found the problem。 I upgraded dify from 0.15.7 to 1.3.1,Incompatible existing knowledge bases。
Here are two versions of diff
0.15.7 metadata json.dumps:https://github.com/langgenius/dify/blob/0.15.7/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L131
1.3.1 metadata is dict: https://github.com/langgenius/dify/blob/1.3.1/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L187
Is this a break change?
I am having issue with uploading "unsupported" file types to knowledge base such as ".json", ".py" files which are the same as ".txt".
I am not asking for a change in the file extension filter. However, the error message from the server is confusing since it says something like "parameter missing". Would this be changed?
Request:
filepath=hello_world.py
curl --location --request POST --url "$api_base/datasets/$dataset_id/document/create_by_file" \
--header "Authorization: Bearer $api_key" --header "type:text/plain" \
--form 'data={"indexing_technique":"economy","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}};type=text/plain' \
--form "file=@$filepath"
Response:
{"code": "invalid_param", "message": "", "status": 400}
Hi, @leason00. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.
Issue Summary:
- You reported a document indexing error in Dify v1.3.1 caused by a metadata field type mismatch after upgrading from v0.15.7.
- This change in metadata handling was identified as a breaking change affecting compatibility.
- You provided full error logs upon my request.
- Another user highlighted unclear error messages when uploading unsupported file types, suggesting better server feedback.
- The issue remains unresolved and points to challenges in upgrade compatibility and usability.
Next Steps:
- Please let me know if this issue is still relevant with the latest version of Dify by commenting here to keep the discussion open.
- If I don’t hear back within 15 days, this issue will be automatically closed.
Thank you for your understanding and contribution!