[Question]: How to get the progress bar
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
When I use the Python API to upload files and parse files, I don't see the situation of uploading and parsing files. I can only go to the front end to check from time to time whether the uploaded files have been parsed, how can I add the progress bar to see the progress of the document parsing
RAGFlow's Python API and backend don't provide real-time progress updates for file uploads or parsing. Instead, the frontend shows progress by polling the backend for status updates. To add a progress bar in your Python workflow, you can implement a polling loop that periodically queries the document's status and progress fields via the API, then update your progress bar accordingly.
Here's a basic example of how you might do this:
import time
def get_document_status(doc_id, client):
# Replace with the actual API call to fetch document status
doc_info = client.get_document(doc_id)
return doc_info['progress'], doc_info['status']
def show_progress_bar(progress):
# Simple text progress bar
bar_length = 40
filled_length = int(bar_length * progress)
bar = '=' * filled_length + '-' * (bar_length - filled_length)
print(f'\rParsing: |{bar}| {progress*100:.2f}%', end='')
doc_id = 'your_document_id'
client = ... # Your RAGFlow API client
while True:
progress, status = get_document_status(doc_id, client)
show_progress_bar(progress)
if status == 'FINISHED' or progress >= 1.0:
print('\nParsing complete!')
break
time.sleep(2) # Adjust polling interval as needed
You can adjust the polling interval for responsiveness vs. server load. If you want a richer UI, consider using libraries like tqdm for progress bars. Note that sometimes the progress may appear stuck due to backend resource limits or bugs—see this issue and this one for more context.
Currently, this polling approach is the main way to track progress programmatically, as real-time streaming updates are not supported by the backend or SDK out-of-the-box.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@Jialiang123 Through API?
@Jialiang123 As @dosubot mentioned, this is a proper way to implement it.
According to the documentation, when you list documents in the dataset,there are some fields related to progress:
progress: float The current processing progress as a percentage. Defaults to 0.0.
progress_msg: str A message indicating the current progress status. Defaults to "".
process_begin_at: datetime The start time of document processing. Defaults to None.
process_duation: float Duration of the processing in seconds. Defaults to 0.0.
run: str The document's processing status:
"UNSTART" (default)
"RUNNING"
"CANCEL"
"DONE"
"FAIL"
status: str Reserved for future use.
For example:
{'chunk_count': 17,
'chunk_method': 'presentation',
'created_by': '5d958c84d31e11efbf4f0242c0a86402',
'dataset_id': '3ce67b8c4cbe11f0a4d90242c0a86403',
'id': '8b1f38c84d7e11f09d120242c0a86403',
'name': 'xxx.pptx',
'parser_config': {'auto_keywords': 0,
'auto_questions': 0,
'chunk_token_num': 512,
'delimiter': '\n',
'graphrag': {'use_graphrag': False},
'html4excel': False,
'layout_recognize': 'DeepDOC',
'raptor': {'use_raptor': False}},
'process_begin_at': 'Fri, 20 Jun 2025 16:07:55 GMT',
'process_duration': 0.0,
'progress': 1.0,
'progress_msg': '\n'
'16:07:55 Task has been received.\n'
'16:08:03 Page(1~100000001): Text extraction finished.\n'
'16:09:54 Page(1~100000001): Generate 17 chunks\n'
'16:09:55 Page(1~100000001): Embedding chunks (0.83s)\n'
'16:09:55 Page(1~100000001): Indexing done (0.18s). Task done '
'(119.91s)',
'run': 'DONE',
'size': 3875454,
'source_type': 'local',
'status': '1',
'thumbnail': 'thumbnail_8b1f38c84d7e11f09d120242c0a86403.png',
'token_count': 5858,
'type': 'doc'}
refer https://ragflow.io/docs/dev/python_api_reference#list-documents
Hope this helps.
Yes, that's correct—the fields you listed (progress, progress_msg, process_begin_at, process_duration, run, status) are the ones you should use to track document parsing progress via the Python API. The standard approach is to poll the API for these fields and update your progress bar accordingly. This is how the frontend implements progress tracking as well.
A few tips and caveats:
- The progress field (float, 0.0–1.0) is the main indicator for your progress bar.
- progress_msg gives you a human-readable status, including queue position if tasks are waiting example.
- run shows the current state: "UNSTART", "RUNNING", "CANCEL", "DONE", or "FAIL".
- If you see progress stuck at 0.0% or "Task is queued...", it may be due to resource limits, task queueing, or backend issues. See this FAQ and this issue for troubleshooting.
- Bugs or misconfigurations (like using MariaDB instead of MySQL, or a bad parser_config) can prevent progress from updating example.
- If you want to show richer progress, you can also display progress_msg and run alongside your progress bar.
If you run into issues where progress doesn't update, check server logs, ensure the task executor is running, and consider restarting the backend if needed example.
Let me know if you need a code example or run into any specific issues!
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Appreciate @xinzhuang 's help. Closing this issue for now. Feel free to reopen it when you have any further questions.