Feature/docs generator
Type of change
- [x] New Feature (non-breaking change which adds functionality)
What problem does this PR solve?
This PR introduces a new Docs Generator agent component for producing downloadable PDF, DOCX, or TXT files from Markdown content generated within a RAGFlow workflow.
Key Features
Backend
- New component: DocsGenerator (agent/component/docs_generator.py)
- Markdown → PDF/DOCX/TXT conversion
- Supports tables, lists, code blocks, headings, and rich formatting
- Configurable document style (fonts, margins, colors, page size, orientation)
- Optional header logo and footer with page numbers/timestamps
Frontend
- New configuration UI for the Docs Generator
- Download button integrated into the chat interface
- Output wired to the Message component
- Full i18n support
Documentation
Added component guide: docs/guides/agent/agent_component_reference/docs_generator.md
Usage
Add the Docs Generator to a workflow, connect Markdown output from an upstream component, configure metadata/style, and feed its output into the Message component. Users will see a document download button directly in the chat.
Contributor Note
We have been following RAGFlow since more than a year and half now and have worked extensively on personalizing the framework and integrating it into several of our internal systems. Over the past year and a half, we have built multiple platforms that rely on RAGFlow as a core component, which has given us a strong appreciation for how flexible and powerful the project is.
We also previously contributed the full Italian translation, and we were glad to see it accepted. This new Docs Generator component was created for our own production needs, and we believe that it may be useful for many others in the community as well.
We want to sincerely thank the entire RAGFlow team for the remarkable work you have done and continue to do. If there are opportunities to contribute further, we would be glad to help whenever we have time available. It would be a pleasure to support the project in any way we can.
If appropriate, we would be glad to be listed among the project’s contributors, but in any case we look forward to continuing to support and contribute to the project.
PentaFrame Development Team
Appreciations! Could you please give some screen shot GIF or something to demostrate the scenario?
The Block was born as PDF Generator and later improved to allow DOCX and TXT files. We used it a lot to generate reports and complex documents! I've attached the test json agent so you can also import and test it in you environment.
Hope you'll find it useful!
And this is how the block looks like. We are open to listen for any edit, comment or suggestion to improve it! Thanks to everyone. PentaFrame-Development
Cool! There're conflicts, please resolve them.
Wow, you've done so much work and transformation - amazing! I also noticed that you designed this massive agent workflow. I would like to sincerely ask you: when my agent workflow runs for too long, the page becomes unresponsive and I can't do any interaction until it turns into a blank page. This often causes my results to be lost and makes it difficult for others to use. I'm wondering if you've ever encountered this problem? Thank you!
@LingYi-Z01 We made it using the begin component in 'TASK' Mode (so doesnt waits for user query in chat to start) and for us just works fine. We personally never encoutered this problem but if you end up with a blank page maybe from dev console (F12 in the browser) or in the backend logs you can see the error. Feel free to export and share your json agent so we can test it too. Or you can open a new discussion about it and i'll follow it from there!
@KevinHuSh can i solve the conflicts or should i wait for @buua436 review?
Thx. PentaFrame-Development
Thank you for the feedback! I've addressed both issues:
1. ReportLab dependency Added reportlab>=4.4.1 to pyproject.toml using uv add reportlab.
2. Multi-language support (black squares fix) The Problem: ReportLab's TTFont has an issue where charToGlyph returns None for CJK characters even when the font file contains the glyphs. This causes missing characters (black squares) in generated PDFs for Chinese, Japanese, Korean, and other non-Latin scripts.
Our Solution: We implemented automatic font detection and switching using ReportLab's CID fonts:
Latin-only content (English, Spanish, French, German, etc.) uses the user-selected font (Helvetica, Times-Roman, or Courier) Content containing CJK, Arabic, Hebrew, Thai, or Hindi automatically switches to STSong-Light CID font This approach requires no user configuration - the system detects the content and chooses the appropriate font.
Changes made:
pyproject.toml : Added reportlab>=4.4.1 dependency Dockerfile : Added fonts-freefont-ttf and fonts-noto-cjk packages agent/component/docs_generator.py : Implemented CJK detection and CID font switching docs/guides/agent/agent_component_reference/docs_generator.md : Added documentation about automatic font behavior
We are open to alternative solutions if you have a preferred approach for handling multi-language PDF generation. The current implementation prioritizes compatibility and automatic detection, but we can adjust based on project requirements.
CI failure.
Thank you for the updates — we've reviewed the changes and can confirm that the new feature works as expected.
There is just one more step needed: 👉 Please address the CI issues currently reported in the pipeline.
Once the CI checks pass, we will proceed to merge your PR.
Thanks again for your great contribution!
Done! Thanks to you guys! When possible please have a look on this discussion!
Have a nice day! PentaFrame-Development