ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: build_TOC failed.

Open wcbzcy opened this issue 2 weeks ago • 2 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

None

RAGFlow image version

0.22.1

Other environment information


Actual behavior

2025-12-08 10:13:40,036 INFO 24 set_progress(62b72948d3db11f0adf626e28beec172), progress: -1, progress_msg: 10:13:39 [ERROR][Exception]: 'str' object has no attribute 'get' 2025-12-08 10:13:40,047 ERROR 24 handle_task got exception for task {"id": "62b72948d3db11f0adf626e28beec172", "doc_id": "11511060d3da11f0adf626e28beec172", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b9d0beecd03411f096a852b9d901a252", "parser_id": "naive", "parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": true, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file"}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "name": "FlexGrid_Class_Wijmo_API.md", "type": "doc", "location": "FlexGrid_Class_Wijmo_API.md", "size": 193069, "tenant_id": "9b20df7ad02c11f096a852b9d901a252", "language": "English", "embd_id": "text-embedding-3-large@OpenAI", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": true, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0, "scope": "file"}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "claude-sonnet-4-5-20250929@Anthropic", "update_time": 1765159965793, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 967, in handle_task await do_handle_task(task) File "/ragflow/common/connection_utils.py", line 76, in async_wrapper return await func(*args, **kwargs) File "/ragflow/rag/svr/task_executor.py", line 938, in do_handle_task d = toc_thread.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/ragflow/rag/svr/task_executor.py", line 396, in build_TOC toc: list[dict] = trio.run(run_toc_from_text, [d["content_with_weight"] for d in docs], chat_mdl, progress_callback) File "/ragflow/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 2435, in run raise runner.main_task_outcome.error File "/ragflow/rag/prompts/generator.py", line 770, in run_toc_from_text "level": toc_item.get("level", "0"), AttributeError: 'str' object has no attribute 'get'

Expected behavior

No response

Steps to reproduce

None

Additional information

No response

wcbzcy avatar Dec 08 '25 02:12 wcbzcy

Hi @wcbzcy! I'm Dosu and I’m helping the ragflow team.

The error happens because the TOC generation code expects every item to be a dictionary, but sometimes receives a string instead—usually due to malformed LLM output or missing type validation in the pipeline. Specifically, in run_toc_from_text, the code calls .get() on each TOC item without checking its type, which causes the crash if the item is a string or another non-dict type. This is a known issue in the codebase and has been discussed in previous threads and code reviews. The recommended fix is to filter or validate that all TOC items are dictionaries before accessing them, or to raise a descriptive exception if the structure is invalid to prevent unhandled exceptions and make debugging easier. You can see more details and code context in this issue and the relevant code in generator.py.

Recent pull requests have also updated the TOC pipeline to use async/await and clarified that run_toc_from_text should process a list of strings and return a list of dicts, reinforcing the need for strict type validation in this part of the code (PR #11785, PR #11685). If you’re updating or patching this logic, make sure to enforce type checks right after parsing the LLM output, before any .get() calls on TOC items.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 08 '25 02:12 dosubot[bot]

@wcbzcy Hi, thank you for your report!

If possible, could you please send the failed document to [email protected] ? This will help us reproduce the issue and investigate it more accurately. It would also be helpful if you could provide more details about your parsing configuration and model configuration.

If the data is sensitive and cannot be shared, you may also try upgrading to the nightly version, which contains several fixes that might resolve the problem.

Thank you again for your feedback — we really appreciate your contribution!

buua436 avatar Dec 08 '25 03:12 buua436

check #11826

Magicbook1108 avatar Dec 17 '25 11:12 Magicbook1108