dify icon indicating copy to clipboard operation
dify copied to clipboard

Error reported when uploading knowledge base file

Open leason00 opened this issue 6 months ago • 3 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.3.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Error reported when uploading knowledge base file。error msg: <ServerinternalError: (code=15112, message=filter index field type mismatch. Field name: metadata, input type: json, expect type: string)

Image

Error stack in worker pod:

2025-05-29 13:45:39,144.144 ERROR [Dummy-18] [indexing_runner.py:96] - consume document failed
Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 80, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 570, in _load
    tokens += future.result()
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/indexing_runner.py", line 625, in _process_chunk
    index_processor.load(dataset, chunk_documents, with_keywords=False)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 78, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 173, in create
    self._vector_processor.create(texts=texts, embeddings=embeddings, **kwargs)
  File "/app/api/core/rag/datasource/vdb/tencent/tencent_vector.py", line 167, in create
    self.add_texts(texts, embeddings)
  File "/app/api/core/rag/datasource/vdb/tencent/tencent_vector.py", line 192, in add_texts
    self._client.upsert(
  File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/stub.py", line 432, in upsert
    return self.vdb_client.upsert(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/vdbclient.py", line 55, in upsert
    result: olama_pb2.UpsertResponse = self.rpc_client.upsert(request, timeout=timeout, ai=ai)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 68, in upsert
    raise se
  File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 65, in upsert
    self._result_check(ret, ret.warning)
  File "/app/api/.venv/lib/python3.12/site-packages/tcvectordb/rpc/client/rpcclient.py", line 556, in _result_check
    raise ServerInternalError(code=code, message=msg)
tcvectordb.exceptions.ServerInternalError: <ServerInternalError: (code=15112, message=filter index field type mismatch. Field name: metadata, input type: json, expect type: string)>

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

leason00 avatar May 30 '25 03:05 leason00

Please provide full error logs in the container.

crazywoola avatar May 30 '25 05:05 crazywoola

Please provide full error logs in the container.

Updated error stack @crazywoola

leason00 avatar May 30 '25 06:05 leason00

I found the problem。 I upgraded dify from 0.15.7 to 1.3.1,Incompatible existing knowledge bases。

Here are two versions of diff

0.15.7 metadata json.dumps:https://github.com/langgenius/dify/blob/0.15.7/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L131

1.3.1 metadata is dict: https://github.com/langgenius/dify/blob/1.3.1/api/core/rag/datasource/vdb/tencent/tencent_vector.py#L187

Is this a break change?

leason00 avatar May 30 '25 06:05 leason00

I am having issue with uploading "unsupported" file types to knowledge base such as ".json", ".py" files which are the same as ".txt".

I am not asking for a change in the file extension filter. However, the error message from the server is confusing since it says something like "parameter missing". Would this be changed?

Request:

filepath=hello_world.py

curl --location --request POST --url "$api_base/datasets/$dataset_id/document/create_by_file" \
--header "Authorization: Bearer $api_key" --header "type:text/plain" \
--form 'data={"indexing_technique":"economy","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}};type=text/plain' \
--form "file=@$filepath"

Response:

{"code": "invalid_param", "message": "", "status": 400}

James4Ever0 avatar Jul 04 '25 11:07 James4Ever0

Hi, @leason00. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You reported a document indexing error in Dify v1.3.1 caused by a metadata field type mismatch after upgrading from v0.15.7.
  • This change in metadata handling was identified as a breaking change affecting compatibility.
  • You provided full error logs upon my request.
  • Another user highlighted unclear error messages when uploading unsupported file types, suggesting better server feedback.
  • The issue remains unresolved and points to challenges in upgrade compatibility and usability.

Next Steps:

  • Please let me know if this issue is still relevant with the latest version of Dify by commenting here to keep the discussion open.
  • If I don’t hear back within 15 days, this issue will be automatically closed.

Thank you for your understanding and contribution!

dosubot[bot] avatar Aug 29 '25 16:08 dosubot[bot]