deep-searcher icon indicating copy to clipboard operation
deep-searcher copied to clipboard

Different llm with diiferent errors

Open oldunclez opened this issue 11 months ago • 4 comments

If using deepseek-r1-32b , the error is "dimension mismatch". If using deepseek-r1-7b , the error is "SyntaxError: invalid character ',' (U+FF0C)".

from deepsearcher.configuration import Configuration, init_config
from deepsearcher.online_query import query

config = Configuration()

# Customize your config here,
# more configuration see the Configuration Details section below.
config.set_provider_config("llm", "OpenAI", {"model": "deepseek-r1-32b", "base_url": "http://192.168.23.10/v1"})
config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "bge-m3", "base_url": "http://192.168.23.10/v1","dimension": 1024})
init_config(config = config)

# Load your local data
from deepsearcher.offline_loading import load_from_local_files
load_from_local_files("/tmp/cl.txt")


# Query
result = query("出差住亲戚家可以报销多少") # Your question here
root@df017718c0f5:/deep-searcher# python  mytest.py
Loading files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 17.54it/s]
Embedding chunks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.93s/it]
<think> Select agent [ChainOfRAG] to answer the query [出差住亲戚家可以报销多少] </think>

>> Iteration: 1

<think> Perform search [<think>
好吧,用户的问题是关于出差住在亲戚家可以报销多少。首先,我得弄清楚中国的出差住宿报销政策是什么样的。一般来说,报销是根据公司的政策和当地的住宿标准来定的,但住亲戚家可能不算在内。所以,我需要知道用户的具体情况,比如他们所在的城市,这样才能查找当地的住宿标准。

然后,用户可能想知道住亲戚家是否有报销的可能性,或者是否有其他报销方式。比如,如果住在亲戚家,是否可以申请其他形式的补贴,或者是否有例外情况可以报销。所以,问清楚是否住在亲戚家附近,或者是否有其他需求,可以帮助更好地回答。

最后,用户可能希望得到明确的数字,比如具体的报销金额。所以,了解这些细节有助于给出准确的答案。
</think>

出差住宿的具体城市是哪里?] on the vector DB collections: ['deepsearcher'] </think>

<search> Search [<think>
好吧,用户的问题是关于出差住在亲戚家可以报销多少。首先,我得弄清楚中国的出差住宿报销政策是什么样的。一般来说,报销是根据公司的政策和当地的住宿标准来定的,但住亲戚家可能不算在内。所以,我需要知道用户的具体情况,比如他们所在的城市,这样才能查找当地的住宿标准。

然后,用户可能想知道住亲戚家是否有报销的可能性,或者是否有其他报销方式。比如,如果住在亲戚家,是否可以申请其他形式的补贴,或者是否有例外情况可以报销。所以,问清楚是否住在亲戚家附近,或者是否有其他需求,可以帮助更好地回答。

最后,用户可能希望得到明确的数字,比如具体的报销金额。所以,了解这些细节有助于给出准确的答案。
</think>

出差住宿的具体城市是哪里?] in [deepsearcher]...  </search>

2025-03-06 07:03:51,698 [ERROR][handler]: RPC error: [search], <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 6144, actual 1024.: segcore error)>, <Time:{'RPC start': '2025-03-06 07:03:51.496238', 'RPC error': '2025-03-06 07:03:51.698903'}> (decorators.py:140)
2025-03-06 07:03:51,699 [ERROR][search]: Failed to search collection: deepsearcher (milvus_client.py:414)
2025-03-06 07:03:51,699 - CRITICAL - fail to search data, error info: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 6144, actual 1024.: segcore error)>
Traceback (most recent call last):
  File "/deep-searcher/deepsearcher/vector_db/milvus.py", line 113, in search_data
    search_results = self.client.search(
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py", line 415, in search
    raise ex from ex
  File "/usr/local/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py", line 400, in search
    res = conn.search(
          ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 836, in search
    return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 777, in _execute_search
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 766, in _execute_search
    check_status(response.status)
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/utils.py", line 64, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 6144, actual 1024.: segcore error)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/deep-searcher/mytest.py", line 22, in <module>
    result = query("出差住亲戚家可以报销多少") # Your question here
             ^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/online_query.py", line 10, in query
    return default_searcher.query(original_query, max_iter=max_iter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/rag_router.py", line 69, in query
    answer, retrieved_results, n_token_retrieval = agent.query(query, **kwargs)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 196, in query
    all_retrieved_results, n_token_retrieval, additional_info = self.retrieve(query, **kwargs)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 178, in retrieve
    intermediate_answer, retrieved_results, n_token1 = self._retrieve_and_answer(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 124, in _retrieve_and_answer
    retrieved_results = self.vector_db.search_data(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/vector_db/milvus.py", line 133, in search_data
    log.critical(f"fail to search data, error info: {e}")
  File "/deep-searcher/deepsearcher/tools/log.py", line 89, in critical
    raise RuntimeError(message)
RuntimeError: fail to search data, error info: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 6144, actual 1024.: segcore error)>

If using deekseek-r1-7b , the error is

root@df017718c0f5:/deep-searcher# python  mytest.py
Loading files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 21.01it/s]
Embedding chunks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.05s/it]
<think> Select agent [ChainOfRAG] to answer the query [出差住亲戚家可以报销多少] </think>

>> Iteration: 1

Traceback (most recent call last):
  File "/deep-searcher/deepsearcher/llm/base.py", line 44, in literal_eval
    result = ast.literal_eval(response_content.strip())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 2
    好,我现在要帮助用户解决关于出差住亲戚家可以报销多少的问题。首先,我需要明确用户的主要需求是什么。用户想知道在住亲戚家的情况下,出差可以报销的金额是多少。这可能涉及到公司或单位的报销政策,或者是个人财务方面的安排。
     ^
SyntaxError: invalid character ',' (U+FF0C)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/deep-searcher/mytest.py", line 22, in <module>
    result = query("出差住亲戚家可以报销多少") # Your question here
             ^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/online_query.py", line 10, in query
    return default_searcher.query(original_query, max_iter=max_iter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/rag_router.py", line 69, in query
    answer, retrieved_results, n_token_retrieval = agent.query(query, **kwargs)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 196, in query
    all_retrieved_results, n_token_retrieval, additional_info = self.retrieve(query, **kwargs)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 178, in retrieve
    intermediate_answer, retrieved_results, n_token1 = self._retrieve_and_answer(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 115, in _retrieve_and_answer
    selected_collections, n_token_route = self.collection_router.invoke(query=query)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/collection_router.py", line 42, in invoke
    selected_collections = self.llm.literal_eval(chat_response.content)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/llm/base.py", line 49, in literal_eval
    raise ValueError(
ValueError: Invalid JSON/List format for response content:
<think>
好,我现在要帮助用户解决关于出差住亲戚家可以报销多少的问题。首先,我需要明确用户的主要需求是什么。用户想知道在住亲戚家的情况下,出差可以报销的金额是多少。这可能涉及到公司或单位的报销政策,或者是个人财务方面的安排。

接下来,我应该考虑用户可能遇到的具体情况。例如,用户可能想知道报销的具体标准,是按天计算,还是根据一定的费用比例来计算。还可能涉及到是否有额外的开销需要报销,比如交通费、住宿费等。

为了更准确地回答这个问题,我需要了解哪些具体的信息。这可能包括用户的工作性质,因为不同类型的公司报销标准可能不同。此外,用户住亲戚家的时间有多长,以及他们在这期间的具体消费情况,比如是否有餐饮费、住宿费的详细记录。

因此,我应该提出一个简单明了的询问,直接询问用户出差住亲戚家的总天数。这样可以帮助我更好地计算报销金额,并根据实际情况提供更精确的答案。如果用户能提供天数,我就可以参考相关的报销标准,如每天多少元,或者根据实际发生的费用来计算。

总结一下,我需要设计一个简单的问题,询问用户出差住亲戚家的总天数,以便进一步提供准确的报销信息。

oldunclez avatar Mar 06 '25 07:03 oldunclez

The first error is same as: https://github.com/zilliztech/deep-searcher/issues/104 The second error is because the model is too small to answer the question well, similar to when you asked it to return a number but it returned a letter.

SimFG avatar Mar 06 '25 07:03 SimFG

Although using

load_from_local_files(
    paths_or_directory="/tmp/cl.txt",
    collection_name="chailv",
    collection_description="chailv",
    force_new_collection=True, # If you want to drop origin collection and create a new collection every time, set force_new_collection to True
)

it still throw errors:

root@df017718c0f5:/deep-searcher# python  mytest.py
create collection [chailv] successfully
Loading files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 18.40it/s]
Embedding chunks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.03s/it]
<think> Select agent [ChainOfRAG] to answer the query [出差住亲戚家可以报销多少] </think>

>> Iteration: 1

<think> Perform search [<think>
好的,我需要帮助用户回答关于出差住在亲戚家可以报销多少的问题。根据之前的回答,已经知道出差住宿费通常按照实际发生的费用报销,但不超过规定的标准。为了更详细地回答这个问题,我需要了解用户的具体情况。

首先,用户出差的地区可能影响报销的标准,因为不同地区的住宿费用标准可能不同。例如,一线城市和二线城市的报销标准可能不一样,所以询问出差的具体地点是有必要的。

其次,用户可能有不同的职务或职位,这也会影响报销的标准。通常,不同级别的员工有不同的报销额度,了解用户的职位可以帮助确定适用的具体标准。

另外,了解用户的行程天数也很重要,因为报销金额通常与天数有关,知道了天数,可以估算总的报销金额。

最后,询问是否有其他费用需要报销,比如交通费、餐费等,可以帮助用户提供更全面的信息,确保所有可报销的费用都被考虑在内。

综上所述,我需要进一步询问以下信息:
1. 出差的具体地点在哪里?
2. 您的职位是什么?
3. 出差的天数是多少天?
4. 是否有其他费用需要报销?

这样,我就能根据这些信息提供更准确的报销金额建议。
</think>

出差的具体地点在哪里?] on the vector DB collections: ['chailv', 'deepsearcher'] </think>

<search> Search [<think>
好的,我需要帮助用户回答关于出差住在亲戚家可以报销多少的问题。根据之前的回答,已经知道出差住宿费通常按照实际发生的费用报销,但不超过规定的标准。为了更详细地回答这个问题,我需要了解用户的具体情况。

首先,用户出差的地区可能影响报销的标准,因为不同地区的住宿费用标准可能不同。例如,一线城市和二线城市的报销标准可能不一样,所以询问出差的具体地点是有必要的。

其次,用户可能有不同的职务或职位,这也会影响报销的标准。通常,不同级别的员工有不同的报销额度,了解用户的职位可以帮助确定适用的具体标准。

另外,了解用户的行程天数也很重要,因为报销金额通常与天数有关,知道了天数,可以估算总的报销金额。

最后,询问是否有其他费用需要报销,比如交通费、餐费等,可以帮助用户提供更全面的信息,确保所有可报销的费用都被考虑在内。

综上所述,我需要进一步询问以下信息:
1. 出差的具体地点在哪里?
2. 您的职位是什么?
3. 出差的天数是多少天?
4. 是否有其他费用需要报销?

这样,我就能根据这些信息提供更准确的报销金额建议。
</think>

出差的具体地点在哪里?] in [chailv]...  </search>

2025-03-06 07:55:52,264 [ERROR][handler]: RPC error: [search], <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 4096, actual 1024.: segcore error)>, <Time:{'RPC start': '2025-03-06 07:55:52.044667', 'RPC error': '2025-03-06 07:55:52.263984'}> (decorators.py:140)
2025-03-06 07:55:52,264 [ERROR][search]: Failed to search collection: chailv (milvus_client.py:414)
2025-03-06 07:55:52,264 - CRITICAL - fail to search data, error info: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 4096, actual 1024.: segcore error)>
Traceback (most recent call last):
  File "/deep-searcher/deepsearcher/vector_db/milvus.py", line 113, in search_data
    search_results = self.client.search(
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py", line 415, in search
    raise ex from ex
  File "/usr/local/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py", line 400, in search
    res = conn.search(
          ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 836, in search
    return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 777, in _execute_search
    raise e from e
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 766, in _execute_search
    check_status(response.status)
  File "/usr/local/lib/python3.11/site-packages/pymilvus/client/utils.py", line 64, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 4096, actual 1024.: segcore error)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/deep-searcher/mytest.py", line 26, in <module>
    result = query("出差住亲戚家可以报销多少") # Your question here
             ^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/online_query.py", line 10, in query
    return default_searcher.query(original_query, max_iter=max_iter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/rag_router.py", line 69, in query
    answer, retrieved_results, n_token_retrieval = agent.query(query, **kwargs)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 196, in query
    all_retrieved_results, n_token_retrieval, additional_info = self.retrieve(query, **kwargs)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 178, in retrieve
    intermediate_answer, retrieved_results, n_token1 = self._retrieve_and_answer(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/agent/chain_of_rag.py", line 124, in _retrieve_and_answer
    retrieved_results = self.vector_db.search_data(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deep-searcher/deepsearcher/vector_db/milvus.py", line 133, in search_data
    log.critical(f"fail to search data, error info: {e}")
  File "/deep-searcher/deepsearcher/tools/log.py", line 89, in critical
    raise RuntimeError(message)
RuntimeError: fail to search data, error info: <MilvusException: (code=2000, message=vector dimension mismatch, expected vector size(byte) 4096, actual 1024.: segcore error)>

oldunclez avatar Mar 06 '25 08:03 oldunclez

You can try to clear all collections in milvus. The main reason for this error is that there are other collections in milvus, but their dims are different.

SimFG avatar Mar 06 '25 08:03 SimFG

config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "bge-m3", "base_url": "http://192.168.23.10/v1","dimension": 1024})

最好再确认一下你的embedding模型支持的维度,我也遇到这个情况,最后发现设定的Dimension如果超了模型支持的Dimension值,就会出现这个错误,并且embedding返回的结果维度翻倍。我的设置768就好了

magicpose avatar Mar 06 '25 11:03 magicpose