dify An error message is displayed after the knowledge base file is uploaded

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.8

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Create a knowledge base
Upload the md file and set all policies to default
Wait for a moment. The Error state of the text Embedding processing is displayed

Error message screenshot:

docker-worker-1 container error logs

[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
    return self._invoke(model, credentials, texts, user)
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
    embeddings_batch, embedding_used_tokens = self._embedding_invoke(
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 70, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 667, in _load
    index_processor.load(dataset, chunk_documents)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
    embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
  File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
    raise ex
  File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
    embedding_result = self._model_instance.invoke_text_embedding(
  File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'

✔️ Expected Behavior

The knowledge base can be used normally

❌ Actual Behavior

The uploaded document cannot be processed properly

Mar 09 '24 16:03 joginwen

Hey there @joginwen! Great to see you around again. Hope you're doing well!

Thank you for providing detailed information about the issue you're experiencing. Based on the error logs you've shared, it seems like the error is occurring during the text embedding process. Specifically, the error message TypeError: argument should be a bytes-like object or ASCII string, not 'list' suggests that the function base64.b64decode() is receiving a list as an argument, whereas it expects a bytes-like object or ASCII string.

This error might be due to the format or content of the uploaded .md file. It's possible that the text embedding function is encountering a list where it expects a string, causing the TypeError.

As a first step, I would recommend checking the content of the .md file you're trying to upload. Make sure that it doesn't contain any unusual formatting or data types that might be interpreted as a list during processing.

If the issue persists, you might want to try the solutions suggested in similar issues:

Archiving and unarchiving your documents as suggested in this issue.
Changing the volumes path as suggested in this issue.

Please try these steps and let us know if the issue still persists. We're here to help!

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Mar 09 '24 16:03 dosubot[bot]

PR: https://github.com/langgenius/dify/pull/2781 , it has fixed, thanks for your feedback @joginwen

Mar 11 '24 07:03 JohnJyong

PR: #2781 , it has fixed, thanks for your feedback @joginwen

After updating the code, I tested it again, and the same error occurred

[2024-03-11 12:31:59,277: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
    return self._invoke(model, credentials, texts, user)
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
    embeddings_batch, embedding_used_tokens = self._embedding_invoke(
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 71, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 674, in _load
    index_processor.load(dataset, chunk_documents)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
    embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
  File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
    raise ex
  File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
    embedding_result = self._model_instance.invoke_text_embedding(
  File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'
[2024-03-11 12:31:59,281: INFO/MainProcess] Processed dataset: b41d31bf-1279-46a9-8ad4-b37706b1f2a2 latency: 2.85326619999978

Mar 11 '24 12:03 joginwen

Have you restarted the celery service? @joginwen

Mar 11 '24 16:03 JohnJyong

Have you restarted the celery service? @joginwen

My deployment is based on the docker compose container. After I update the main branch code, I mount the source directory to the inner container /app/api directory, replacing the inner container code

docker-compose fragment

version: '3.1'
services:
  # API service
  api:
    volumes:
      # Mount the storage directory to the container, for storing user files.
      - ./volumes/app/storage:/app/api/storage
      - /Users/liuda/Documents/work/ai/dify/api:/app/api
   worker:
    volumes:
      # Mount the storage directory to the container, for storing user files.
      - ./volumes/app/storage:/app/api/storage
      - /Users/liuda/Documents/work/ai/dify/api:/app/api

docker-compose down Then docker-compose up -d completes the restart and after testing, errors still appear.

Mar 12 '24 03:03 joginwen

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.8

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Create a knowledge base
Upload the md file and set all policies to default
Wait for a moment. The Error state of the text Embedding processing is displayed

Error message screenshot:

docker-worker-1 container error logs

[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
    return self._invoke(model, credentials, texts, user)
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
    embeddings_batch, embedding_used_tokens = self._embedding_invoke(
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 70, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 667, in _load
    index_processor.load(dataset, chunk_documents)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
    embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
  File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
    raise ex
  File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
    embedding_result = self._model_instance.invoke_text_embedding(
  File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'

✔️ Expected Behavior

The knowledge base can be used normally

❌ Actual Behavior

The uploaded document cannot be processed properly

May I ask if you have resolved it?

Mar 14 '24 07:03 AI-HR

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.8

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Create a knowledge base
Upload the md file and set all policies to default
Wait for a moment. The Error state of the text Embedding processing is displayed

Error message screenshot: docker-worker-1 container error logs

[2024-03-09 16:20:56,842: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 31, in invoke
    return self._invoke(model, credentials, texts, user)
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 71, in _invoke
    embeddings_batch, embedding_used_tokens = self._embedding_invoke(
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in _embedding_invoke
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/app/api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py", line 184, in <listcomp>
    return ([list(np.frombuffer(base64.b64decode(data.embedding), dtype="float32")) for data in response.data],
  File "/usr/local/lib/python3.10/base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "/usr/local/lib/python3.10/base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'list'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 70, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 667, in _load
    index_processor.load(dataset, chunk_documents)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 120, in create
    embeddings = self._embeddings.embed_documents([document.page_content for document in texts])
  File "/app/api/core/embedding/cached_embedding.py", line 51, in embed_documents
    raise ex
  File "/app/api/core/embedding/cached_embedding.py", line 35, in embed_documents
    embedding_result = self._model_instance.invoke_text_embedding(
  File "/app/api/core/model_manager.py", line 95, in invoke_text_embedding
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/text_embedding_model.py", line 33, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai] Error: argument should be a bytes-like object or ASCII string, not 'list'

✔️ Expected Behavior

The knowledge base can be used normally

❌ Actual Behavior

The uploaded document cannot be processed properly

May I ask if you have resolved it?

No, I upgraded to version 0.5.9 and still got the same error.

Mar 14 '24 16:03 joginwen

i have the same error and dont know how to fix it

Mar 18 '24 03:03 AI-HR

I have encountered the same issue as well. I reproduced it using the following code:

I discovered that the issue was due to my proxy not sending the "encoding_format": "base64" parameter to OpenAI's /v1/embeddings endpoint. Below is the response I received after including the encoding_format parameter, which can be correctly parsed.

import base64
import numpy as np

req = {
    "data": [
        {
            "embedding": "347WvESoQbuQors7La/4u3ekPbwV3Kg8bOrHvPWHn7vPNzq8XUyQvDSr1DwleuA8nfQivFi9dzyd9KK8rnFnPBh0mrsoEtK7QoKZPJyVbbw1d808HsolPSoE87zjmJG7KgTzu3mDSjzp7hw8fI2FvAtpzrtSIFG7wKwWPAFoPby3TNC8WyHiu4oFlTtatB49B0z/O9nxrzqqoLk6yeEuu3J02jxoGZq7kW60PGRWejwiLhC8yBU2POhpvzyIbaO8qNTAvCLP2jt3XaK8oJ8ovHgpm7xFdDo6DO4rO6Y8zzyKBZW7b4K5PCpjKLxkVvq8TnVLO6qgOTzpj2e91K44PCTGgbrhbeM7RLtVPBjOSbysf8a7Cg+fPGZNIT2XahA9jD5RO2k/Qjvs4L08CL7IvCyJ0LwhvMY7T9SAvBPqhzy/J7k833tCvHRTZ7zrFMU8CYrBOihxhzsc0/67+rJ8PHJ0Wjl/fyY7KfHeO1b2hLtYHK089wd3POnbCDzG7w09DaeQus1YrTw3Dz+8OqewvBQQsLuPfJO7VSoMvd+O1rs6AeA7FjZYuwmKQTwA9vO7r+MwPYV2/Ds5Nee8alLWPFHGobymlv67QoKZvFBUWDufjBS8u3yzvJsoKrzkZAq7vDWYu/2pIzxhq/Q7DI92PCwvIbyzjjY8VBJyvEV0Orxbx7I8gz1AOuRkCj33U5i7uZ0mPL0BkTx5g8q8ZYEouhjOSTpqsYs7OmAVvbdf5Lw5Nec8A6H5PEZAs7o6SPu7G8Dquw8/gjqs2XU7dGuBO2D3lbxiMNK80JFpOwmKwbzXWT47puKfvJnX0zrTVIk8AQ4OOxV98zwBr9i8vs0JPF4YCTtck6s8+HnAPBzYBLpmOg28d12iPNCk/Tsfgwq8GgwMPUQai7u134y8PD8iu4KEW7r+Fmc8wniPu9u9qDwVfXO8GlMnOi2c5LyC45C8ppZ+vN5orrwiiD87YQqqPBzYBL3NWK28tSYouwjRXDtDTpI8EovSvABVKT1C3Eg9Wvs5vKzZ9bvruhW/k/ORuiEWdrym9bO8BYWMPEmRCbw2Q8Y8yeEuuy7HkrzhbeO86e4cvGpSVjy7Hf65iZPLvOqngbxJ67i8oPlXvG5vpbqbb8W8zIy0OwbyT7z1dIs4wniPu1tofTpxB5c6y2aMvOXRzTwWlY27/nUcvNQIaDwT6oe8Fdyou5InGTwC2ga9Z6dQPWt9BDswR+q7ZA9fPEKCGTt8iP88hMIdvEtwFrxNTyM8qBtcu9aNxbwxWn48B1EFPILLdrx3XaK88EQou5zhjjxkbpS8wixuu5LI4zwJQyY8bf3bO3LTDz1lyEO9sJyVvBKLUjwlwXu8DyfoPE4bHLyoway81MFMPHLTD7tIH8C8ZFZ6PPzdqjwV3Kg70ogQOx63kTyECTk76e4cvMNEiLxN8G08Sl0CPKXPi7utS7+8xwIiPURhprsUEDA8fI0FvJdqkLxTpa473ONQPMr0wryd9KK8ulYLPYhtI7zD5dI8cO98vFkvQbvFIxW88tyZPM1FGT3Q8B68rJJavG1ckbww7bq8SUXou6S38by1x/I7E55mPCXZFT0FhQw8OIEIvfI2SbzsOm083Zy1vESowboYKPk7pjxPPOnuHLws6AU7oFgNvX2gmTuNsJo7bygKPX7GQbv56wk80oiQPAaYoDzkqyW80RbHPKFroTwBDg48bOpHvOG5hDy3qwW9OmAVu8zTT7sYFWU8IE8DvK8qzDy5ipK85TADPZLI47uq+mi81SACvPPvLTyNUWU8B1GFPFkvQbw3/Co8snuivPtrYbthHb45A+2aO8aQ2Dt2MvQ7uYqSvGFk2TvBv6o8RBoLvNvQvLuuvQg905ukvPcHd7xdTJC8l/jGOvcH9zuQ/Oq8VvaEvLI0h7w7cym8i9ENPEZAM7x/OAu9+v6dvELcyLwuIcK8wE1hvAnk8DxOdcu8gbjivIG4YrsUV0u8YPeVPFWEO7yeGks7TfBtPCt2vLyNnYa8exu8PMBN4TvGSb08eTyvPCyJ0Lxp5ZI8Utk1vYLjEDwocYe8wQbGPKqzTTxwqGG7hujFPI5k+bl1fpW6tcfyPLmKEj3QpP06Weglulb2BLyaXDE8e8GMvA7NODvkqyW8HNgEPSmXr7xaoYo7jh3evAOhebwQUpa8gst2POvNqTwt+5k7AnvRPKXPC7zUZ508jfe1uwFoPTtxYcY8XnK4OyUgsTsp8V48DaeQvI2wmjy8NRg8o0quuxNEN7x7G7y691OYO+Faz7sVffM8bUT3u0bhfTxIxZC80KkDPLsdfjvjmBE8ZwaGPNG8Fz2cO7684zncPGWBqDxdpr87VBLyO8CZgrylg2q7OZScO6Y8zzyiJIa8DUjbOzET4zwKD5+7APbzO3ddIj3eaK48ksjjPO8xFLyX+Ea6qrNNOjCmH7yUYNU8iG0jPCRnTLvow2457fNRvEXOabwhYhc9RuF9u2EdvjvpNTi8C2lOvBcC0TsV3Kg8Bt87PP+IMLzP3Yo8lp4XvRdhhrtDNvg8ocXQO86yXDjOEZK7bf3bvDFyGD1OvOY76kjMPIV7grvmijI8swAAPS/aJrtxYca8Ii4QuyscjTwohBu8XIAXvWt9hDxNT6M7bUR3ur/gHb1A/Tu97fPRPBXJlLy7aZ+7hdUxvVWEOzz7ypa74t+sumIwUroSeL66NorhvEV0ujqAS5+7I0EkvaVw1rtOGxw9OZScOq69CDyBXrO78n3kvAtpzjrNn8i5I/oIPfEQoTw7ukS8xFccvLsdfrvfe8K8iCaIveu6lTzP3Qq9eMplPDM+ET3FxN+73sLdO/ZAhDxG+Zc8OZQcu60EJLxXCZk7F2GGO6QWp7wwR+q81kYqPGmGXTz4ecA7wRlaO/dmLD2v0Jw8I+LuOyyJ0LzmncY7+UU5vFnoJTzUrrg8uYqSvJOU3DzfNKe8zrJcu0RhpjuOZHk8Qsm0PEdmWzx+xkE7SrcxvJFutDwbHyC8C2lOPEpdgjxkD1+8VwmZO44dXjw3Dz+7KgTzvDXWgrx7Yle8RAJxO2+CubxkW4C8gEufO9BKzrzT9dO8VYS7vOG5hDxSf4a8ZFZ6vBRXSzzf1XG8L3txvC7HkrvKTnK6gReYuihxB7xZ6CW8v+Adu6OkXTrPJCY8a32EPJf4xrvMMgU90KT9PIiAt7zjTPC8EFIWO0/P+rxVcac505skPF4Yibxw73y7oFiNuuwnWTym4p87FSPEu5InmTwWSWy8oiQGvD8xQzxZ6KW8UcahOyaljryv47A79vRiuztzKb2eeQC8nOEOvflFOTyT8xE7ZdtXPHoIqDzEV5y8RYfOvBt5zzzc49C8Dzr8O9cSIzyzRxs8pvUzPFb2BLzpNbg83ImhvIzkobuCy3Y8A6H5vO1Shzwvkws8C8gDvZoCAj3YbFK5sQnZvDqnMDyao0y82fEvvPzdKjxcgBc8EPNgvJh9JLyw9sS8DnOJPD4eL7wqBHO8kzqtvKFrIb0T6ge8TrxmO2vEH7y4K908mMS/vOnuHLu1Jqg8xSMVvDSr1DwIvsg8bDFjPF2mv7ylKTu67h6AvBIxI71A6qe8Yav0ursihDwt+5k8A47lPFUqjLyZSZ08eggoO57Am7sf3Tm7TrzmvJEUBbuXsau8cWFGPM/dijwHUYU8NdYCvFOSmjnqATG7WeglO/OoEjxlgag7dX4VvMIsbr0ohJs82ZcAunYy9Lt7wQw811k+vcvAu7zj8kA8sPbEvGEKKj2uXlO8wKwWPQBVqbzbdo08DO6rvOETtDwSMaM8luWyPEV0Orx857S7nmHmPFmJ8DwbZrs7PZnRu7tpn7y7HX47qxKDPEOVrbzp7py7gKXOuzFa/rzfNKe8H4OKvHyI/7sbHyC9alJWPDZDRry28iC75TCDPBdhhrwYFWU8zVitPNCRaTsNAUA9fzgLvI98Ez3+dRw9RvkXPJw7vrxl21e9swAAvKD51zpbx7I8WS9BPYahKjx85zQ8MXKYPLI0hzsAVSm8Aa/YvHgpmzs4ae48rDirPABVKbyLcli8GYeuOheooTxNqdK7Rc7pvM2fSLtnYLW8YPcVu1jVET0hvMa7kVugPIdaD7xoGZq89Lumuzaddbzd9mQ7Xaa/u0+I37yxryk9g/YkuyoE8zs/i3I8Gx8gPD34Brw5NWe8HzdpPH+SOj0OuqQ77IYOvMJ4jzx0spy7l1L2PIaOljxPz/o7PmXKvAaYoLzUZx28+UU5O51O0jzqpwG8ZLWvO3Uf4Dt/kjq8gsv2vAnk8DqBuOI8iTkcPOUY6bym4p+8q8bhvFi99zwXAtE68WrQu9xChrst+xk9gReYvKBYDT2W5bI8dFNnPJ/Tr7yaAgK7RuYDuXQMzLzwRCg9NkPGvO/SXryDUNS8bhBwPH7Gwbv8lo88OqcwPKLYZD1Cgpk8HOuYu65e07wwpp+6DhRUO8bvDb0et5E8rDgrO7A9YLx71KA8r9CcvLdf5LvQqQM8ugrqvOPywLzkqyW8Hf4sPRNEt7y7aZ+8ojcavDf8qrzSiJC8Eb9Zu6J+NTobZjs7ikwwPXr1k7zuHgC8ksjjPIpMsLy/Jzm8uNGtvG62QLy5nSa9SZGJvGHDDrwVyZQ8egiou/8uAb1f5AG9g1DUvEyWPr1eX6S85oqyvLXH8jyCy/Y6WL13OlD6qDzkZAo8CL7IuyiEmztf5IG8RXS6O1jVkbv/LgG9VvYEPVsOTronuCK9T9SAPKORyTs8LI462ZL6ulh23DwK/Ao8WL13vB836bugWA07YQqqvJKByLwDR0o87mUbvFHGoTyp59Q8mv37u2mZ8bwQUhY8Zk2hO6Pr+DtatB48Pazlu2r4JrwVyZQ8yG/lu9DwHrzXEqM7y3mgu70BET06TQE7pBanPNLiPzy/4J088ESovKD517uo1MA8FX1zvGCY4Ds6YJW8qyWXPEjFkLvM5mO8HOsYvLvDzjqgn6i8fUHkOpFutLslwXu8Norhuw5bbzx7G7y8uHL4O7oK6jv5jFQ8jbAavIQJubt8jQU8sWiOPOaKsrzVMxa9KmMoPBvA6jzqpwE7sjSHPhbvvDs1d827P4vyPB63kTnW5/Q8fmwSPaPr+DzP3Qo9nOEOPaw4q7wp3sq8XnK4ubJ7Iryd9KK7GCj5vHyNBb2lz4u8l/jGvE4bHLvuGfo6jbAaPOKAd7xZ6KW8evUTPAoPnzvnD5C7lp4Xu8Is7jzxyQU9KxwNvS1VyTx+xsE6nhpLPFh23LuXUvY6CyKzOuzgPTsEE8M8ZYEou2gZmjy3vpm7t0zQO4zkobw1d008XhiJvKCfKLywnBW9wKwWOkaa4juzocq7YnftOzFyGDxJ6zg9TIOqu2A+MTx53Xm7mMS/uywvoTogT4M7tcfyvBNENz1DNvi7f3+muwdMf7u1Jqg8ed15vE/UgLvruhU8LiFCvM35d7qdrQe9NTAyvEOVLbvbdg29T+cUvSu91zsiLpA8NR0ePXCVTT1lgai8U5Kau0G2oLsJ5HA3Sl0CPM5rQbwOzbg8lj/iOybsqbwidSs8xkm9O+ImyLuL0Y28+p9ovIOX77uaXDE8+YxUPChxBztjohs8bJCYu2RbAL3KTnI9RkCzPAHCbLzCi6O7Y0Pmu77NibpkWwA9qyUXPPUo6rzhuYQ8S8pFvT4eLzy/gWi8E55mvN2ctTvEV5w8rfGPvHk8r7v564m8jmT5PCqqw7xbbYO8UQ29O+AAoLzqSMy7JMYBPOS+uTvbF9g76Y/nvMY2qTwJQya9CvwKPbpWC73ATeG6A+2aumP8SjwNAcC7TN3ZvKeumDxTkho8goRbu0/P+juUvwq82gTEPNbn9Dtj/Eo8RXQ6vC/apjwfJFW8E0S3vJk2ibzFxF+7BcynOgIhojz7ETK6zyQmvTKFrLwkDR090KR9PDmUHL3Ci6O8Ty4wPRgV5ToqBHO8vQGRPK1LP74UELA8cKjhO3pPw7uUvwo7LOiFPG/c6DhP1IA7h1qPvK/jMLsxX4Q85uRhvFD6qLxCI+S7OIEIPbCclTy6Vgu9s0ebPOvNKTz4eUC7o0ouPeUY6bztmaK8NdaCO3yNBT1dTJA8Mz4Rva8qTDzgRzu8JXrgvN1VmrwSeD68mqPMPM/dCjyqWR48LEI1vEq3sbxw9II8YR2+vHPmIzzHu4Y8ho6WuvPvLTyTTcE8SUVovHf+bLv98D48I5vTuzM+ETsD7Zo7L9qmO/1K7rwAQpU85ooyPD1Stjyowaw7sjSHPCV6YDxpP0K8GUCTuXjK5TuYfaS80PAePKaW/rsgT4O8V1C0vAoPH72DUFQ8z92KvEbhfTxqsQs6r9Acva1Lv7vzqJK8bJAYPeFt4zwXu7W8/mKIu/q3Aj3VerG8UWfsOwJ7UTt4KZu6+yTGPDw/IjzxEKE7RGEmPIiAN7shA+I80oiQvLcFtTzqpwG9FcmUub0BEb3qSMw8eBYHvEjFEDw3/Cq8MEdqOzpglbydTtK7UbONuou5c7xA6qc7ulaLu58tX7sXu7U8/8/LPPCe1zxZifA7e9Qgve2ZojwlwXs8YPcVPajUwLvb0Dw8G8BqvGY6jbvOsly8/fA+PKJ+tTt69RM86ltgPFyAlzzBGVq7ZLUvPNUggr1W9oS7puIfPfjAWzz29GK8P9eTPNmS+jqoG1w8JkbZvIDs6TtPz/q8tBMUvWD3lbyeeQC845iRPIboxbz24c46aBkavNLivzvXEiM9nmHmusSxy7yHWg+7tBOUvNmS+rvRXWK7bOpHvJsoKjzp24g8uT5xPAbfOz3f1XG8dLIcvEky1LvueC+8B2SZvJsoqrwnuCK81/8OPT34BryJORw8Eb/ZPFBU2DuOHd684O2Lu5dS9rzsOu28Gq3WPGWBKLx8iH+8bhBwO0bhfTqg+de7nIJZvO4ZejyJ2mY8/nWcvFgcrTx0DMy8wKwWO8359zvcKmy8na2HvCnxXjxCyTQ5gKXOPI4dXjv7a+G7yeGuO9EDszvQSk68OmCVuVLZtbwVI8Q4tSYovcbvjbz0u6Y8TIOqvB1YXDzLGmu8Y0PmvB836bzld548RuaDvFbe6jtid+07+rcCPAcF5Du1bcM7xLHLvMCsljzW53Q8nnmAu0NOErzRAzO89BVWvBXcKLygQPM8AEKVvOw67TzGST07YWTZvNV6sb1oGZo8DacQPODtCz3BBsY84O0LvFS4QjsxubO8XICXO4oFFTw4gYi8WBwtPOS+ObpnBoY8zIy0vFqhiromRtm4sJyVPFOlrjwYKPk7LseSPCXZlbyYxD880EpOPF4YCTzRFsc7mZA4vNCk/Twjm1O8j3yTvGen0LuZNom8Iog/u5r9ezxMPA+8NKtUveXRzbt2kSk89kCEPDAAzzyL0Q29B0x/vAi+SDxt/du7fzgLvN80pzxZQlW8PZnRO65e0zty0w+9wKyWPPUoarsTRDe9pvUzvH2gmbx71CC9tvIgPHr1E7uP1kK7fyDxvPdmLDxbIWI8Fu+8PO4eALxjohs8JkbZO5r9e7x0DMy7TmI3OzQKCr1HU8e8X8xnvFlCVbudBzc8td8MOvLcmTy0E5Q87ZmiOp4aS7yPfBM8a30EPS2v+DtG+Re96270O91VGro26RY9zyQmvLpWC7t5PK+8Okh7Om+CObylcFY8ed35O1VxJzu2Oby8IbxGPd/VcbyfLV87IE+DOhgoebtmlDw879JeO/Ijtbs4yKO80PCevEpdArwxE+O7o5FJvJUZOryiJAY9WqGKvPqy/DcardY8urC6PHpPwzs9Uja7NAqKPPq3Ar30uya9kVugu6WD6jypRoo8VBLyOpf4xjrd9mQ8XJOrPK/QHLt3/uy8mv17OYLjELx9oBm8UiDROqyS2jv8lo+8cKjhvK/QnDxfK528ags7PEbh/bsK/Io9arELPXyNhbv83aq6jIVsvJD86juBXjO7kW40u9EDs7zFxN+8Nw+/PDk15zuKTDC8VSoMvQD2c7yXUnY7ZpQ8vGY6jbxv3Oi8s6FKPNwq7DwQmTG81K44O/I2SbkrHA29pAMTvHmWXjyq+mg8x1xRvE68ZrykA5O8zsXwPKQWJ7tVcae8Y0NmO0M2eDy8NZi8kQ//uw8n6DzRvBc8fIj/ubk+8Txw7/y85XcevW1cETxfhcy64VrPOymXL7ypLnA5",
            "index": 0,
            "object": "embedding"
        }
    ],
    "model": "text-embedding-ada-002",
    "object": "list",
    "usage": {
        "prompt_tokens": 421,
        "total_tokens": 421
    }
}
embedding_data = req['data'][0]['embedding']
# Decode the base64 string and convert it to a NumPy array of float32 type
decoded_array = np.frombuffer(base64.b64decode(embedding_data), dtype=np.float32)

print(decoded_array)

data = req['data'][0]
print(np.frombuffer(base64.b64decode(data['embedding']), dtype="float32"))

You can modify the line extra_model_kwargs['encoding_format'] = 'base64' in the file api/core/model_runtime/model_providers/openai/text_embedding/text_embedding.py by commenting it out.

This is the result of my testing, I am not sure if you are encountering the same issue.

Mar 18 '24 14:03 zjbshk

Upgrade to 0.5.9, the following error is reported when the knowledge base upload document

[2024-03-21 15:22:38,510: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[f2551a43-5607-4269-a61b-e1360c4c92f8] received
[2024-03-21 15:22:38,517: INFO/MainProcess] Start process document: 950e9171-108d-481a-9eee-da396574a68f
[2024-03-21 15:22:39,054: DEBUG/MainProcess] Created new connection using: 3075a88a36494dc3afd4685ad08b42fe
[2024-03-21 15:22:39,656: ERROR/MainProcess] RPC error: [insert_rows], <DataNotMatchException: (code=1, message=Attempt to insert an unexpected field to collection without enabling dynamic field)>, <Time:{'RPC start': '2024-03-21 15:22:39.653886', 'RPC error': '2024-03-21 15:22:39.656706'}>
[2024-03-21 15:22:39,657: ERROR/MainProcess] Failed to insert batch starting at entity: 0/11
[2024-03-21 15:22:39,657: ERROR/MainProcess] Failed to insert batch starting at entity: 0/11
[2024-03-21 15:22:39,657: ERROR/MainProcess] consume document failed
Traceback (most recent call last):
  File "/app/api/core/indexing_runner.py", line 70, in run
    self._load(
  File "/app/api/core/indexing_runner.py", line 667, in _load
    index_processor.load(dataset, chunk_documents)
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 59, in load
    vector.create(documents)
  File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 121, in create
    self._vector_processor.create(
  File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 75, in create
    self.add_texts(texts, embeddings)
  File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 101, in add_texts
    raise e
  File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 95, in add_texts
    ids = self._client.insert(collection_name=self._collection_name, data=batch_insert_list)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 206, in insert
    raise ex from ex
  File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 198, in insert
    res = conn.insert_rows(collection_name, insert_batch, timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler
    raise e from e
  File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler
    raise e from e
  File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 501, in insert_rows
    request = self._prepare_row_insert_request(
  File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 482, in _prepare_row_insert_request
    return Prepare.row_insert_param(
  File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 422, in row_insert_param
    return cls._parse_row_request(request, fields_info, enable_dynamic, entities)
  File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 370, in _parse_row_request
    raise DataNotMatchException(message=ExceptionsMessage.InsertUnexpectedField)
pymilvus.exceptions.DataNotMatchException: <DataNotMatchException: (code=1, message=Attempt to insert an unexpected field to collection without enabling dynamic field)>
[2024-03-21 15:22:39,663: INFO/MainProcess] Processed dataset: bd66a1d2-d871-42c4-8fe7-4275be32a591 latency: 1.1507785804569721

Mar 21 '24 02:03 andylzming

dify dify copied to clipboard

An error message is displayed after the knowledge base file is uploaded

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Sources

About Dosu

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dify
dify copied to clipboard