ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: Fail to bind LLM used by RAPTOR

Open dromeuf opened this issue 1 year ago • 8 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

RAGFlow workspace code commit ID

8939206531d8b994fd10565d4056b24ea599c1c5

RAGFlow image version

v0.15.0-slim

Other environment information

Linux Ubuntu 5.15.167.4-microsoft-standard-WSL2

Actual behavior

I use version 0.15.0-slim with local Ollama for embedding (snowflake-arctic-embed2) and LLM (qwen2.5:14b) with success for parse.

If I activate RAPTOR for a knowledge base, I get an error and Fail parsing :


[ERROR]Fail to bind LLM used by RAPTOR: 3 vs. 4
[ERROR]handle_task got exception, please check log

logs:

2024-12-19 10:38:25,090 INFO     32 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2024-12-19T10:38:25.090195", "boot_at": "2024-12-19T09:11:49.795794"
, "pending": 1, "lag": 0, "done": 4, "failed": 1, "current": {"id": "927bb7e4bdec11efb92d0242ac120006", "doc_id": "af0101f6bde911ef8f240242ac120006", "from_page": 100000000, "t
o_page": 100000000, "retry_count": 0, "kb_id": "4576b942bde911efb5920242ac120006", "parser_id": "paper", "parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"
use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\n
The above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}}, "name": "MNRAS_67P_ModelSAP_stae1290.pdf", "type": 
"pdf", "location": "MNRAS_67P_ModelSAP_stae1290.pdf", "size": 6289538, "tenant_id": "ee69c9acbde111efbe830242ac120006", "language": "English", "embd_id": "snowflake-arctic-embe
d2:latest@Ollama", "pagerank": 0, "img2txt_id": "llama3.2-vision:latest@Ollama", "asr_id": "", "llm_id": "qwen2.5:14b@Ollama", "update_time": 1734600923929, "task_type": "rapto
r"}}                                                                                                                                                                            
2024-12-19 10:38:28,448 INFO     32 HTTP Request: POST http://host.docker.internal:11434/api/chat "HTTP/1.1 200 OK"                                                             
2024-12-19 10:38:28,488 INFO     32 HTTP Request: POST http://host.docker.internal:11434/api/embeddings "HTTP/1.1 200 OK"                                                       
2024-12-19 10:38:28,512 ERROR    32 summarize got exception                                                                                                                     
Traceback (most recent call last):                                                                                                                                              
  File "/ragflow/rag/raptor.py", line 92, in summarize                                                                                                                          
    chunks.append((cnt, self._embedding_encode(cnt)))                                                                                                                           
  File "/ragflow/rag/raptor.py", line 48, in _embedding_encode                                                                                                                  
    response = get_embed_cache(self._embd_model.llm_name, txt)                                                                                                                  
  File "/ragflow/graphrag/utils.py", line 104, in get_embed_cache                                                                                                               
    return np.array(json.loads(bin.decode("utf-8")))                                                                                                                            
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?                                                                                                 
2024-12-19 10:38:28,515 INFO     32 set_progress(927bb7e4bdec11efb92d0242ac120006), progress: -1, progress_msg: Page(100000001~100000001): [ERROR]Fail to bind LLM used by RAPTO
R: 3 vs. 4                                                                                                                                                                      
2024-12-19 10:38:28,545 ERROR    32 Fail to bind LLM used by RAPTOR: 3 vs. 4                                                                                                    
Traceback (most recent call last):                                                                                                                                              
  File "/ragflow/rag/svr/task_executor.py", line 438, in do_handle_task                                                                                                         
    chunks, token_count, vector_size = run_raptor(task, chat_model, embedding_model, progress_callback)                                                                         
  File "/ragflow/rag/svr/task_executor.py", line 370, in run_raptor                                                                                                             
    chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)                                                                                            
  File "/ragflow/rag/raptor.py", line 132, in __call__                                                                                                                          
    assert len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end, n_clusters)                                                                                   
AssertionError: 3 vs. 4

My Ollama local LLM I'm using for RAPTOR isn't compatible ? or other bug / problem ?

Thank's for your great work.

Expected behavior

No response

Steps to reproduce

I can't find any information about RAPTOR LLM in the documentation and the problem is identical for each document added.

Additional information

No response

dromeuf avatar Dec 19 '24 10:12 dromeuf

It's ok now with the nightly. Tanks.

dromeuf avatar Dec 21 '24 13:12 dromeuf

For 3/51 documents I always get a RAPTOR error, but if I relaunch in UI, keeping the chunks already calculated (not clear existing chunks), it manages to finish and the document goes to success status. unfortunately, as I've launched about twenty simultaneous parse operations, I can't find the console logs for sure.

For 1/51 it's not working (Rapport 2005_I.pdf)

Progress:
Page(0~12): reused previous task's chunks.
Page(12~24): reused previous task's chunks.
Page(24~36): reused previous task's chunks.
Page(36~48): reused previous task's chunks.
Page(48~60): reused previous task's chunks.
Page(60~72): reused previous task's chunks.
Page(72~84): reused previous task's chunks.
Page(84~96): reused previous task's chunks.
Page(96~108): reused previous task's chunks.
Page(108~117): reused previous task's chunks.
Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
Task has been received.
Page(100000001~100000001): Cluster one layer: 320 -> 7
Page(100000001~100000001): Cluster one layer: 7 -> 4
Page(100000001~100000001): [ERROR]Fail to bind LLM used by RAPTOR: 2 vs. 3
[ERROR]handle_task got exception, please check log

log console

2024-12-22 10:34:52,402 INFO     43360 HTTP Request: POST http://host.docker.internal:11434/api/chat "HTTP/1.1 200 OK"
2024-12-22 10:34:52,450 INFO     43360 HTTP Request: POST http://host.docker.internal:11434/api/embeddings "HTTP/1.1 200 OK"
2024-12-22 10:34:52,467 ERROR    43360 summarize got exception
Traceback (most recent call last):
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2024-12-22 10:34:52,470 INFO     43360 set_progress(994223dac04711efaefd0242ac120006), progress: -1, progress_msg: Page(100000001~100000001): [ERROR]Fail to bind LLM used by RAPTOR: 2 vs. 3
2024-12-22 10:34:52,488 ERROR    43360 Fail to bind LLM used by RAPTOR: 2 vs. 3
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 438, in do_handle_task
    chunks, token_count, vector_size = run_raptor(task, chat_model, embedding_model, progress_callback)
  File "/ragflow/rag/svr/task_executor.py", line 370, in run_raptor
    chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
  File "/ragflow/rag/raptor.py", line 132, in __call__
    assert len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end, n_clusters)
AssertionError: 2 vs. 3
2024-12-22 10:34:52,491 INFO     43360 set_progress(994223dac04711efaefd0242ac120006), progress: -1, progress_msg: [ERROR]handle_task got exception, please check log
2024-12-22 10:34:52,507 ERROR    43360 handle_task got exception for task {"id": "994223dac04711efaefd0242ac120006", "doc_id": "5a65db7cbfa011efbd950242ac120006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "d5f8d862bf9f11efb7f40242ac120006", "parser_id": "book", "parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}}, "name": "Rapport 2005_I.pdf", "type": "pdf", "location": "Rapport 2005_I.pdf", "size": 15740366, "tenant_id": "ee69c9acbde111efbe830242ac120006", "language": "English", "embd_id": "snowflake-arctic-embed2:latest@Ollama", "pagerank": 0, "img2txt_id": "llama3.2-vision:latest@Ollama", "asr_id": "", "llm_id": "qwen2.5:14b@Ollama", "update_time": 1734859921824, "task_type": "raptor"}
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 511, in handle_task
    do_handle_task(task)
  File "/ragflow/rag/svr/task_executor.py", line 438, in do_handle_task
    chunks, token_count, vector_size = run_raptor(task, chat_model, embedding_model, progress_callback)
  File "/ragflow/rag/svr/task_executor.py", line 370, in run_raptor
    chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
  File "/ragflow/rag/raptor.py", line 132, in __call__
    assert len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end, n_clusters)
AssertionError: 2 vs. 3
2024-12-22 10:35:01,784 INFO     43360 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2024-12-22T10:35:01.784164", "boot_at": "2024-12-21T23:47:57.830717", "pending": 0, "lag": 0, "done": 48, "failed": 4, "current": null}
2024-12-22 10:35:04,238 INFO     23 172.18.0.6 - - [22/Dec/2024 10:35:04] "GET /v1/document/list?kb_id=d5f8d862bf9f11efb7f40242ac120006&keywords=&page_size=100&page=1 HTTP/1.1" 200 -
2024-12-22 10:35:19,396 INFO     23 172.18.0.6 - - [22/Dec/2024 10:35:19] "GET /v1/document/list?kb_id=d5f8d862bf9f11efb7f40242ac120006&keywords=&page_size=100&page=1 HTTP/1.1" 200 -
2024-12-22 10:35:31,815 INFO     43360 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2024-12-22T10:35:31.815294", "boot_at": "2024-12-21T23:47:57.830717", "pending": 0, "lag": 0, "done": 48, "failed": 4, "current": null}

Thanks for your great work

dromeuf avatar Dec 22 '24 09:12 dromeuf

@KevinHuSh Hi, I'm still having this issue in 0.15.1. When I use Ollama (llama3.1), but also when I use Together.ai LLM (llama 3.3) it has the same issue

rplescia avatar Jan 14 '25 12:01 rplescia

@KevinHuSh This is still a bug for me in 0.15.1 using either Ollama or Together.ai as an inference server.

2025-01-20 13:31:37,682 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 200 OK" 2025-01-20 13:31:38,446 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 322 2025-01-20 13:31:39,179 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 322 2025-01-20 13:31:39,467 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 200 OK" 2025-01-20 13:31:40,348 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 359 2025-01-20 13:31:41,174 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 359 2025-01-20 13:31:43,019 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 200 OK" 2025-01-20 13:31:44,021 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 411 2025-01-20 13:31:44,369 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error" 2025-01-20 13:31:44,370 ERROR 17 summarize got exception Traceback (most recent call last): File "/ragflow/rag/raptor.py", line 82, in summarize cnt = self._chat("You're a helpful assistant.", File "/ragflow/rag/raptor.py", line 43, in _chat raise Exception(response) Exception: ERROR: Error code: 500 - {'error': {'message': 'POST predict: Post "http://127.0.0.1:37507/completion": EOF', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-20 13:31:44,478 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error" 2025-01-20 13:31:44,478 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error" 2025-01-20 13:31:44,479 INFO 17 HTTP Request: POST http://ollamainference:11434/v1/chat/completions "HTTP/1.1 500 Internal Server Error" 2025-01-20 13:31:44,480 ERROR 17 summarize got exception Traceback (most recent call last): File "/ragflow/rag/raptor.py", line 82, in summarize cnt = self._chat("You're a helpful assistant.", File "/ragflow/rag/raptor.py", line 43, in _chat raise Exception(response) Exception: ERROR: Error code: 500 - {'error': {'message': 'an error was encountered while running the model: unexpected EOF', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-20 13:31:44,481 ERROR 17 summarize got exception Traceback (most recent call last): File "/ragflow/rag/raptor.py", line 82, in summarize cnt = self._chat("You're a helpful assistant.", File "/ragflow/rag/raptor.py", line 43, in _chat raise Exception(response) Exception: ERROR: Error code: 500 - {'error': {'message': 'an error was encountered while running the model: unexpected EOF', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-20 13:31:44,481 ERROR 17 summarize got exception Traceback (most recent call last): File "/ragflow/rag/raptor.py", line 82, in summarize cnt = self._chat("You're a helpful assistant.", File "/ragflow/rag/raptor.py", line 43, in _chat raise Exception(response) Exception: ERROR: Error code: 500 - {'error': {'message': 'an error was encountered while running the model: unexpected EOF', 'type': 'api_error', 'param': None, 'code': None}} 2025-01-20 13:31:45,010 ERROR 17 LLMBundle.encode can't update token usage for e39eb6dccce011ef965e0242ac120006/EMBEDDING used_tokens: 411 2025-01-20 13:31:45,016 INFO 17 set_progress(a0892a42d73211ef9bad0242ac120006), progress: -1, progress_msg: 13:31:45 Page(100000001~100000001): [ERROR]Fail to bind LLM used by RAPTOR: 11 vs. 15 2025-01-20 13:31:45,022 ERROR 17 Fail to bind LLM used by RAPTOR: 11 vs. 15 Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 440, in do_handle_task chunks, token_count, vector_size = run_raptor(task, chat_model, embedding_model, progress_callback) File "/ragflow/rag/svr/task_executor.py", line 372, in run_raptor chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback) File "/ragflow/rag/raptor.py", line 132, in call assert len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end, n_clusters) AssertionError: 11 vs. 15 2025-01-20 13:31:45,025 INFO 17 set_progress(a0892a42d73211ef9bad0242ac120006), progress: -1, progress_msg: 13:31:45 [ERROR]handle_task got exception, please check log 2025-01-20 13:31:45,031 ERROR 17 handle_task got exception for task {"id": "a0892a42d73211ef9bad0242ac120006", "doc_id": "210ca866d1ce11ef92290242ac120006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "e45d2d96d1cd11efb6c60242ac120006", "parser_id": "laws", "parser_config": {"auto_keywords": 10, "auto_questions": 0, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful and consistent with the numbering, do not make things up. Paragraphs as follows:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 875, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "layout_recognize": true, "task_page_size": 12, "pages": [[1, 1024]]}, "name": "Project-Coach-Acquisition-Term-Facility-Agreement-EXECUTED-05.09.2022_Redacted.pdf", "type": "pdf", "location": "Project-Coach-Acquisition-Term-Facility-Agreement-EXECUTED-05.09.2022_Redacted.pdf", "size": 902627, "tenant_id": "e39eb6dccce011ef965e0242ac120006", "language": "English", "embd_id": "BAAI/bge-large-en-v1.5@FastEmbed", "pagerank": 0, "img2txt_id": "", "asr_id": "", "llm_id": "llama3.1___OpenAI-API@OpenAI-API-Compatible", "update_time": 1737379791346, "task_type": "raptor"} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 513, in handle_task do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 440, in do_handle_task chunks, token_count, vector_size = run_raptor(task, chat_model, embedding_model, progress_callback) File "/ragflow/rag/svr/task_executor.py", line 372, in run_raptor chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback) File "/ragflow/rag/raptor.py", line 132, in call assert len(chunks) - end == n_clusters, "{} vs. {}".format(len(chunks) - end, n_clusters) AssertionError: 11 vs. 15

rplescia avatar Jan 20 '25 13:01 rplescia

Ollama seems have this kind of issue here. What about swithing to another LLM?

KevinHuSh avatar Jan 23 '25 01:01 KevinHuSh

I tried it with Together.ai with llama3.3, the same thing happens.

rplescia avatar Jan 23 '25 09:01 rplescia

I try it with gpustack with qwen2.5 7b, the same thing happens.

verigle avatar Feb 11 '25 01:02 verigle

I encountered the same issue while using vLLM with both Qwen2.5-Coder and InternVL2.5-MPO models.

2025-02-14 03:32:08,751 INFO     16 HTTP Request: POST http://host.docker.internal:20009/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-14 03:32:08,789 INFO     16 HTTP Request: POST http://host.docker.internal:20010/v1/embeddings "HTTP/1.1 200 OK"
2025-02-14 03:32:08,806 ERROR    16 summarize got exception
Traceback (most recent call last):
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:13,899 INFO     16 HTTP Request: POST http://host.docker.internal:20009/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-14 03:32:13,900 INFO     16 HTTP Request: POST http://host.docker.internal:20009/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-14 03:32:13,901 INFO     16 HTTP Request: POST http://host.docker.internal:20009/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-14 03:32:13,976 INFO     16 HTTP Request: POST http://host.docker.internal:20010/v1/embeddings "HTTP/1.1 200 OK"
2025-02-14 03:32:13,998 ERROR    16 summarize got exception
Traceback (most recent call last):
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:14,057 INFO     16 HTTP Request: POST http://host.docker.internal:20010/v1/embeddings "HTTP/1.1 200 OK"
2025-02-14 03:32:14,062 INFO     16 HTTP Request: POST http://host.docker.internal:20010/v1/embeddings "HTTP/1.1 200 OK"
2025-02-14 03:32:14,094 ERROR    16 summarize got exception
Traceback (most recent call last):
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:14,161 INFO     16 HTTP Request: POST http://host.docker.internal:20010/v1/embeddings "HTTP/1.1 200 OK"
2025-02-14 03:32:14,198 INFO     16 set_progress(8463e5f6ea0a11efb27a0242ac1a0006), progress: -1, progress_msg: 03:32:14 [ERROR]Fail to bind LLM used by RAPTOR: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:14,205 ERROR    16 Fail to bind LLM used by RAPTOR: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 499, in do_handle_task
    chunks, token_count = run_raptor(task, chat_model, embedding_model, vector_size, progress_callback)
  File "/ragflow/rag/svr/task_executor.py", line 406, in run_raptor
    chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
  File "/ragflow/rag/raptor.py", line 134, in __call__
    raise th.result()
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:14,217 INFO     16 set_progress(8463e5f6ea0a11efb27a0242ac1a0006), progress: -1, progress_msg: 03:32:14 [ERROR][Exception]: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2025-02-14 03:32:14,229 ERROR    16 handle_task got exception for task {"id": "8463e5f6ea0a11efb27a0242ac1a0006", "doc_id": "bd5ecfbcea0811efa89e0242ac1a0006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "a71e9372ea0811efb49d0242ac1a0006", "parser_id": "naive", "parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": true, "prompt": "\u8acb\u4f9d\u7167\u4e0b\u5217\u8981\u6c42\u5c0d\u4ee5\u4e0b\u5167\u5bb9\u9032\u884cRAPTOR\u6f14\u7b97\u6cd5\u8655\u7406\uff1a\n1. \u56b4\u683c\u4fdd\u7559\u6240\u6709\u95dc\u9375\u6578\u64da\uff0c\u78ba\u4fdd\u6578\u5b57\u5b8c\u5168\u6b63\u78ba\u3002\n2. \u7dad\u6301\u539f\u6587\u6838\u5fc3\u610f\u7fa9\uff0c\u8acb\u52ff\u66f4\u52d5\u3002\n3. \u8acb\u52ff\u65b0\u589e\u4efb\u4f55\u539f\u6587\u672a\u63d0\u53ca\u7684\u8cc7\u8a0a\u3002\n\n\u5167\u5bb9\u5982\u4e0b\uff1a\n{cluster_content}", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 128, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": true}, "name": "adrv9008-1-w-9008-2-w-9009-w-hardware-reference-manual-ug-1295.pdf", "type": "pdf", "location": "adrv9008-1-w-9008-2-w-9009-w-hardware-reference-manual-ug-1295.pdf", "size": 3347239, "tenant_id": "3b764974e9d311efa4ee0242ac1a0006", "language": "English", "embd_id": "gte-Qwen2-7B-instruct___OpenAI-API@OpenAI-API-Compatible", "pagerank": 0, "kb_parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": true, "prompt": "\u8acb\u4f9d\u7167\u4e0b\u5217\u8981\u6c42\u5c0d\u4ee5\u4e0b\u5167\u5bb9\u9032\u884cRAPTOR\u6f14\u7b97\u6cd5\u8655\u7406\uff1a\n1. \u56b4\u683c\u4fdd\u7559\u6240\u6709\u95dc\u9375\u6578\u64da\uff0c\u78ba\u4fdd\u6578\u5b57\u5b8c\u5168\u6b63\u78ba\u3002\n2. \u7dad\u6301\u539f\u6587\u6838\u5fc3\u610f\u7fa9\uff0c\u8acb\u52ff\u66f4\u52d5\u3002\n3. \u8acb\u52ff\u65b0\u589e\u4efb\u4f55\u539f\u6587\u672a\u63d0\u53ca\u7684\u8cc7\u8a0a\u3002\n\n\u5167\u5bb9\u5982\u4e0b\uff1a\n{cluster_content}", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 128, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": true}, "img2txt_id": "InternVL2.5-MPO___OpenAI-API@OpenAI-API-Compatible", "asr_id": "", "llm_id": "InternVL2.5-MPO___OpenAI-API@OpenAI-API-Compatible", "update_time": 1739451636352, "task_type": "raptor"}
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 626, in handle_task
    do_handle_task(task)
  File "/ragflow/rag/svr/task_executor.py", line 499, in do_handle_task
    chunks, token_count = run_raptor(task, chat_model, embedding_model, vector_size, progress_callback)
  File "/ragflow/rag/svr/task_executor.py", line 406, in run_raptor
    chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
  File "/ragflow/rag/raptor.py", line 134, in __call__
    raise th.result()
  File "/ragflow/rag/raptor.py", line 92, in summarize
    chunks.append((cnt, self._embedding_encode(cnt)))
  File "/ragflow/rag/raptor.py", line 49, in _embedding_encode
    if response:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

evenkeelhuang avatar Feb 13 '25 23:02 evenkeelhuang