[Bug]: The document parsing failed and exited.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch name
main
Commit ID
0f95086
Other environment information
ASUS Tuf Gaming F15 Pro 13980HX 4060 Laptop Graphics Card
Win11
WSL2
Actual behavior
When parsing the document, an error occurred and the program exited.
Expected behavior
No response
Steps to reproduce
Update to the latest version (dev or 0.10.0), upload the document, and then parse it.
Additional information
I believe the issue is with the function parameter settings of the newly added error output mechanism. When the LLM returns unexpected information (such as error messages), it causes the program to receive parameters of the wrong type, which in turn triggers the interrupt process mechanism.
I have tested it, and there is no issue with the code built from the 9b3f5fd commit.
Could you paste the docker logs? It shows the calling stack there。 Or, could you share the file sample?
2024-08-26 11:19:43 ragflow-server | Traceback (most recent call last): 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 139, in call 2024-08-26 11:19:43 ragflow-server | result, token_count = self._process_document(text, prompt_variables) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 188, in process_document 2024-08-26 11:19:43 ragflow-server | if response.find("ERROR") >=0: raise Exception(response) 2024-08-26 11:19:43 ragflow-server | Exception: ERROR: Error code: 400, with error text {"contentFilter":[{"level":1,"role":"assistant"}],"error":{"code":"1301","message":"系统检测到输入或生成内容可能包含不安全或敏感内容,请您避免输入易产生敏感内容的提示语,感谢您的配合。"}} 2024-08-26 11:19:43 ragflow-server | 2024-08-26 11:19:43 ragflow-server | During handling of the above exception, another exception occurred: 2024-08-26 11:19:43 ragflow-server | 2024-08-26 11:19:43 ragflow-server | Traceback (most recent call last): 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/svr/task_executor.py", line 165, in build 2024-08-26 11:19:43 ragflow-server | cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/app/knowledge_graph.py", line 18, in chunk 2024-08-26 11:19:43 ragflow-server | chunks = build_knowlege_graph_chunks(tenant_id, sections, callback, 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/index.py", line 85, in build_knowlege_graph_chunks 2024-08-26 11:19:43 ragflow-server | graphs.append(.result().output) 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result 2024-08-26 11:19:43 ragflow-server | return self.__get_result() 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2024-08-26 11:19:43 ragflow-server | raise self._exception 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run 2024-08-26 11:19:43 ragflow-server | result = self.fn(*self.args, **self.kwargs) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 145, in call 2024-08-26 11:19:43 ragflow-server | if callback: callback("Knowledge graph extraction error:{}".format(str(e))) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/svr/task_executor.py", line 80, in set_progress 2024-08-26 11:19:43 ragflow-server | if prog is not None and prog < 0: 2024-08-26 11:19:43 ragflow-server | TypeError: '<' not supported between instances of 'str' and 'int'
Fixed by #2096
Is the latest docker image updated? Or should I build it locally to avoid this issue?
Fixed by #2096
The BUG has been fixed. Thx!