ragflow
ragflow copied to clipboard
[Bug]: [ERROR]Fail to bind LLM used by RAPTOR: {}
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
RAGFlow workspace code commit ID
g891ee85f
RAGFlow image version
v0.16.0-36-g891ee85f slim
Other environment information
Ubuntu 22.04
Actual behavior
I have the following strange behavior during knowledge base creation:
- Initially, I have system model set to locally (ollama) hosted mistral-small.
- Later, I switched the system model to GPU-hosted llama 70b (server with GPU, openai-compatible api)
- I am creating new KB with the following settings (only relevant part is shown):
- I am uploading the documents to KB, starting parsing.
- It successfully creates chunks and questions, but fails during raptor process.
- In interface, it throws the following error:
17:47:26 Page(0~100000000): Reused previous task's chunks.
17:47:28 Start RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
17:47:28 Task has been received.
17:47:43 [ERROR]Fail to bind LLM used by RAPTOR: {}
17:47:43 [ERROR][Exception]: {}
- When I am looking at executor logs, it shows the following logs (pls see additional information section)
Expected behavior
It seems for me that for some reason "raptor model" still points to ollama, ignoring fact that I changed system chat model @KevinHuSh , can you please look into this issue?
Steps to reproduce
See above
Additional information
2025-02-12 17:48:14,787 INFO 56 set_progress(4b62a1dae95011efb9030242ac120006), progress: -1, progress_msg: 17:48:14 [ERROR]Fail to bind LLM used by RAPTOR: {}
2025-02-12 17:48:14,797 ERROR 56 Fail to bind LLM used by RAPTOR: {}
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 499, in do_handle_task
chunks, token_count = run_raptor(task, chat_model, embedding_model, vector_size, progress_callback)
File "/ragflow/rag/svr/task_executor.py", line 406, in run_raptor
chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
File "/ragflow/rag/raptor.py", line 134, in __call__
raise th.result()
File "/ragflow/rag/raptor.py", line 92, in summarize
chunks.append((cnt, self._embedding_encode(cnt)))
File "/ragflow/rag/raptor.py", line 51, in _embedding_encode
embds, _ = self._embd_model.encode([txt])
File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7fa9c3ff32e0>", line 31, in encode
File "/ragflow/api/db/services/llm_service.py", line 235, in encode
embeddings, used_tokens = self.mdl.encode(texts)
File "<@beartype(rag.llm.embedding_model.OllamaEmbed.encode) at 0x7fa9c68e1900>", line 31, in encode
File "/ragflow/rag/llm/embedding_model.py", line 262, in encode
res = self.client.embeddings(prompt=txt,
File "/ragflow/.venv/lib/python3.10/site-packages/ollama/_client.py", line 201, in embeddings
return self._request(
File "/ragflow/.venv/lib/python3.10/site-packages/ollama/_client.py", line 74, in _request
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: {}
2025-02-12 17:48:14,802 INFO 56 set_progress(4b62a1dae95011efb9030242ac120006), progress: -1, progress_msg: 17:48:14 [ERROR][Exception]: {}
2025-02-12 17:48:14,809 ERROR 56 handle_task got exception for task {"id": "4b62a1dae95011efb9030242ac120006", "doc_id": "f21521ace94f11ef92af0242ac120006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "d3f84a3ce94f11ef8cfe0242ac120006", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 1024, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 128, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "\u041e\u0431 \u0443\u0442\u0432\u0435\u0440\u0436\u0434\u0435\u043d\u0438\u0438 \u0444\u0435\u0434\u0435\u0440\u0430\u043b\u044c\u043d\u044b\u0445 \u043d\u043e\u0440\u043c \u0438 \u043f\u0440\u0430\u0432\u0438\u043b \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043f\u0440\u043e\u043c\u044b\u0448\u043b\u0435\u043d\u043d\u043e\u0439 \u0431\u0435\u0437\u043e\u043f\u0430\u0441\u043d\u043e\u0441\u0442\u0438 \u041f\u0440\u0430\u0432\u0438\u043b\u0430 \u0431\u0435\u0437\u043e\u043f\u0430\u0441\u043d\u043e\u0441\u0442\u0438 \u043e\u043f (1) (1).doc", "type": "doc", "location": "\u041e\u0431 \u0443\u0442\u0432\u0435\u0440\u0436\u0434\u0435\u043d\u0438\u0438 \u0444\u0435\u0434\u0435\u0440\u0430\u043b\u044c\u043d\u044b\u0445 \u043d\u043e\u0440\u043c \u0438 \u043f\u0440\u0430\u0432\u0438\u043b \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043f\u0440\u043e\u043c\u044b\u0448\u043b\u0435\u043d\u043d\u043e\u0439 \u0431\u0435\u0437\u043e\u043f\u0430\u0441\u043d\u043e\u0441\u0442\u0438 \u041f\u0440\u0430\u0432\u0438\u043b\u0430 \u0431\u0435\u0437\u043e\u043f\u0430\u0441\u043d\u043e\u0441\u0442\u0438 \u043e\u043f (1) (1).doc", "size": 1335481, "tenant_id": "f5ba73a0c6ab11ef93290242ac120006", "language": "English", "embd_id": "hf.co/yoeven/multilingual-e5-large-instruct-Q5_K_M-GGUF:latest@Ollama", "pagerank": 0, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 1024, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 128, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "img2txt_id": "", "asr_id": "", "llm_id": "llama-3.1-70b-instruct___OpenAI-API@OpenAI-API-Compatible", "update_time": 1739371654320, "task_type": "raptor"}
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 626, in handle_task
do_handle_task(task)
File "/ragflow/rag/svr/task_executor.py", line 499, in do_handle_task
chunks, token_count = run_raptor(task, chat_model, embedding_model, vector_size, progress_callback)
File "/ragflow/rag/svr/task_executor.py", line 406, in run_raptor
chunks = raptor(chunks, row["parser_config"]["raptor"]["random_seed"], callback)
File "/ragflow/rag/raptor.py", line 134, in __call__
raise th.result()
File "/ragflow/rag/raptor.py", line 92, in summarize
chunks.append((cnt, self._embedding_encode(cnt)))
File "/ragflow/rag/raptor.py", line 51, in _embedding_encode
embds, _ = self._embd_model.encode([txt])
File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7fa9c3ff32e0>", line 31, in encode
File "/ragflow/api/db/services/llm_service.py", line 235, in encode
embeddings, used_tokens = self.mdl.encode(texts)
File "<@beartype(rag.llm.embedding_model.OllamaEmbed.encode) at 0x7fa9c68e1900>", line 31, in encode
File "/ragflow/rag/llm/embedding_model.py", line 262, in encode
res = self.client.embeddings(prompt=txt,
File "/ragflow/.venv/lib/python3.10/site-packages/ollama/_client.py", line 201, in embeddings
return self._request(
File "/ragflow/.venv/lib/python3.10/site-packages/ollama/_client.py", line 74, in _request
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: {}