ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: The problem of document progress being stuck all the time

Open channlang opened this issue 1 month ago • 8 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

I changed the sorting rule of the table from utf8mb4_0900_ai_ci to utf8mb4_general_ci, and found that there was an error on the platform, so I changed it back. However, I found that the progress of the document did not change, and the backend log did not report any errors. The progress log and progress status remained unchanged in the database. I'm not sure if it's related to changing the sorting rules of the database

channlang avatar Dec 10 '25 03:12 channlang

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.[x] I confirm that I am using English to submit this report (Language Policy).[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).[x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

I changed the sorting rule of the table from utf8mb4_0900_ai_ci to utf8mb4_general_ci, and found that there was an error on the platform, so I changed it back. However, I found that the progress of the document did not change, and the backend log did not report any errors. The progress log and progress status remained unchanged in the database. I'm not sure if it's related to changing the sorting rules of the database

Mainly because I have already changed the database back, but it still doesn't work

channlang avatar Dec 10 '25 03:12 channlang

There’s no explicit dependency on database collation in RAGFlow’s document progress logic, but schema or collation changes can cause silent failures in the background job that updates progress—these failures often don’t show up in logs. Even after reverting the collation, the system can remain stuck if the Redis queue or task executor is out of sync with the database, or if there’s a hidden config or JSON error in a document record. Simply reverting the collation or updating MySQL rows usually isn’t enough to recover.

Here’s a safe recovery procedure that’s worked for similar cases:

  1. Stop all RagFlow services (including task executors and backend).
  2. Back up your MySQL database (especially the document and task tables) and Redis data.
  3. In MySQL, find and reset or delete stuck tasks and documents:
    SELECT * FROM document WHERE progress < 1 OR run != '1';
    SELECT * FROM task WHERE progress < 1;
    
    To clear them:
    DELETE FROM task WHERE progress < 1;
    UPDATE document SET progress = 0, run = '2' WHERE progress < 1;
    
  4. In Redis (using redis-cli), clear the main queue and executor keys:
    DEL rag_flow_svr_queue
    DEL rag_flow_svr_queue_1
    DEL TASKEXE
    KEYS task_executor_* | xargs -n 1 redis-cli DEL
    
  5. Restart all RagFlow services and check if document progress updates resume.

If you’re still stuck, check for JSON errors in the parser_config field of your documents—invalid JSON can silently break progress updates. Also, make sure the task executor and Redis are healthy and running. Sometimes, just restarting the backend and Redis can resolve a stuck progress updater thread.

To avoid this in the future: don’t use MariaDB, keep to the latest nightly builds, and monitor logs for silent failures. If you want more targeted SQL/Redis commands or help analyzing logs, let me know.

References:

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 10 '25 03:12 dosubot[bot]

There’s no explicit dependency on database collation in RAGFlow’s document progress logic, but schema or collation changes can cause silent failures in the background job that updates progress—these failures often don’t show up in logs. Even after reverting the collation, the system can remain stuck if the Redis queue or task executor is out of sync with the database, or if there’s a hidden config or JSON error in a document record. Simply reverting the collation or updating MySQL rows usually isn’t enough to recover.

Here’s a safe recovery procedure that’s worked for similar cases:

  1. Stop all RagFlow services (including task executors and backend).

  2. Back up your MySQL database (especially the document and task tables) and Redis data.

  3. In MySQL, find and reset or delete stuck tasks and documents: SELECT * FROM document WHERE progress < 1 OR run != '1'; SELECT * FROM task WHERE progress < 1;

    To clear them: DELETE FROM task WHERE progress < 1; UPDATE document SET progress = 0, run = '2' WHERE progress < 1;

  4. In Redis (using redis-cli), clear the main queue and executor keys: DEL rag_flow_svr_queue DEL rag_flow_svr_queue_1 DEL TASKEXE KEYS task_executor_* | xargs -n 1 redis-cli DEL

  5. Restart all RagFlow services and check if document progress updates resume.

If you’re still stuck, check for JSON errors in the parser_config field of your documents—invalid JSON can silently break progress updates. Also, make sure the task executor and Redis are healthy and running. Sometimes, just restarting the backend and Redis can resolve a stuck progress updater thread.

To avoid this in the future: don’t use MariaDB, keep to the latest nightly builds, and monitor logs for silent failures. If you want more targeted SQL/Redis commands or help analyzing logs, let me know.

References:

To reply, just mention @dosu.

How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

It doesn't feel like it's stuck because I can see the process log of the vector from the background, but documen's table won't change the progress status and progress log anymore

logs: 2025-12-10 11:20:58,063 INFO 47 HEAD http://es01:9200/ragflow_76affcaa4b5411f0a2490242ac140006 [status:200 duration:0.004s]

2025-12-10 11:20:58,189 INFO 47 From minio(0.12595784291625023) 人工智能全球治理行动计划(1).txt/人工智能全球治理行动计划(1).txt

2025-12-10 11:20:58,202 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: 0.1, progress_msg: 11:20:58 Page(1~100000001): Start to parse.

2025-12-10 11:20:58,215 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: 0.8, progress_msg: 11:20:58 Page(1~100000001): Finish parsing.

2025-12-10 11:20:58,441 INFO 47 naive_merge(人工智能全球治理行动计划(1).txt): 0.22622135281562805

2025-12-10 11:20:58,442 INFO 47 Chunking(0.3785380534827709) 人工智能全球治理行动计划(1).txt/人工智能全球治理行动计划(1).txt done

2025-12-10 11:20:58,442 INFO 47 MINIO PUT(人工智能全球治理行动计划(1).txt) cost 0.000 s

2025-12-10 11:20:58,442 INFO 47 Build document 人工智能全球治理行动计划(1).txt: 0.38s

2025-12-10 11:20:58,449 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: None, progress_msg: 11:20:58 Page(1~100000001): Generate 7 chunks

2025-12-10 11:20:59,472 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: 0.7285714285714285, progress_msg:

2025-12-10 11:20:59,474 INFO 47 Embedding chunks (1.02s)

2025-12-10 11:20:59,480 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: None, progress_msg: 11:20:59 Page(1~100000001): Embedding chunks (1.02s)

2025-12-10 11:20:59,515 INFO 47 PUT http://es01:9200/ragflow_76affcaa4b5411f0a2490242ac140006/_bulk?refresh=false&timeout=60s [status:200 duration:0.019s]

2025-12-10 11:20:59,528 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: 0.8142857142857143, progress_msg:

2025-12-10 11:20:59,565 INFO 47 PUT http://es01:9200/ragflow_76affcaa4b5411f0a2490242ac140006/_bulk?refresh=false&timeout=60s [status:200 duration:0.025s]

2025-12-10 11:20:59,571 INFO 47 Indexing doc(人工智能全球治理行动计划(1).txt), page(0-100000000), chunks(7), elapsed: 0.09

2025-12-10 11:20:59,587 INFO 47 set_progress(3dd4f162d57711f0891a0242ac120006), progress: 1.0, progress_msg: 11:20:59 Page(1~100000001): Indexing done (0.10s). Task done (1.67s)

2025-12-10 11:20:59,587 INFO 47 Chunk doc(人工智能全球治理行动计划(1).txt), page(0-100000000), chunks(7), token(1024), elapsed:1.67

2025-12-10 11:20:59,588 INFO 47 handle_task done for task {"id": "3dd4f162d57711f0891a0242ac120006", "doc_id": "00e8d48cd56611f0bd770242ac120006", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "fd8f1140d50411f0b1c30242ac120006", "parser_id": "naive", "parser_config": {"chunk_token_num": 1200, "delimiter": "\n", "html4excel": true, "layout_recognize": "DeepDOC", "raptor": {"use_raptor": false, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light", "community": false, "resolution": false}, "auto_keywords": 0, "auto_questions": 0, "tag_kb_ids": [], "topn_tags": 1, "filename_embd_weight": 0.1, "task_page_size": null, "pages": null}, "name": "\u4eba\u5de5\u667a\u80fd\u5168\u7403\u6cbb\u7406\u884c\u52a8\u8ba1\u5212(1).txt", "type": "doc", "location": "\u4eba\u5de5\u667a\u80fd\u5168\u7403\u6cbb\u7406\u884c\u52a8\u8ba1\u5212(1).txt", "size": 19399, "tenant_id": "76affcaa4b5411f0a2490242ac140006", "language": "English", "embd_id": "bge-m3@Ollama", "pagerank": 0, "kb_parser_config": {"chunk_token_num": 1200, "delimiter": "\n", "html4excel": true, "layout_recognize": "DeepDOC", "raptor": {"use_raptor": false, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": false, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light", "community": false, "resolution": false}, "auto_keywords": 0, "auto_questions": 0, "tag_kb_ids": [], "topn_tags": 1, "filename_embd_weight": 0.1, "task_page_size": null, "pages": null}, "img2txt_id": "", "asr_id": "", "llm_id": "duier007/Qwen3-235B-A22B-W8A8___OpenAI-API@OpenAI-API-Compatible", "update_time": 1765336856378, "task_type": ""}

channlang avatar Dec 10 '25 03:12 channlang

What do you mean by stuck, what's the status?

Image

KevinHuSh avatar Dec 10 '25 03:12 KevinHuSh

What do you mean by stuck, what's the status?

Image
Image Image

In this situation, the progress is not updated anymore. In fact, both the partitioning and vector have been completed, and there are logs of completion, but the stored status has not changed. It seems that the set_progress operation in the server log did not take effect.

channlang avatar Dec 10 '25 04:12 channlang

This problem is difficult to find, and the progress and progress logs are not updated anymore

channlang avatar Dec 10 '25 07:12 channlang

What about delete all docker images and re-compose again. As long as you don't remove volumes, all your data will be preserved.

Magicbook1108 avatar Dec 10 '25 08:12 Magicbook1108

What about delete all docker images and re-compose again. As long as you don't remove volumes, all your data will be preserved.

I have already tried it and there have been no changes. I even deleted the database related table data, Redis queue, and started over again, but it's still the same.

channlang avatar Dec 11 '25 00:12 channlang