[Bug]: WORD files could not be viewed in inference column.
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
N/A
RAGFlow image version
0.19.X
Other environment information
WIN11+Docker desktop+lm studio.
Actual behavior
I am using the ragflow knowledge-based Q&A system and have noticed an issue. For documents in the knowledge base that are in PDF format, the cited references in the Q&A responses can be directly opened by clicking with the mouse. I can scroll through the pages using the mouse wheel to browse the entire document. However, for Word documents, I can only zoom in or out on the page, and I cannot use the mouse wheel to scroll through the pages. This results in the text being too small to read, making it impossible to achieve the purpose of viewing the document. 我在使用ragflow基于知识库的问答系统,发现一个问题,即对于知识库中的文档是PDF的,则在问答回复结果中的引用文献,可以直接鼠标点击打开,通过鼠标滚轮上下翻页,可浏览全文;如果是WORD文档,则只能通过缩放页面方式,无法用滚轮实现上下翻页,导致文档文字过小,无法实现查看的目的。
Expected behavior
I can view WORD files as PDF files.
Steps to reproduce
upload PDF and WORD files, use chat to let llm answer question baseed on the files, click the infernce column.
Additional information
No response
Your experience matches the current limitations of RAGFlow: only PDF files and images support full preview and scrolling in the inference column. Word documents (.docx) are converted to HTML for display using the "mammoth" library, but this approach does not provide the same scrolling or page navigation features as the PDF viewer. As a result, you can only zoom in/out on Word files, and scrolling through pages with the mouse wheel is not supported, which can make the text difficult to read if the document is large or formatted for paging [discussion] [issue].
This is a known limitation and not a configuration or environment issue. Improvements for non-PDF file previews, including Word documents, are planned for future versions of RAGFlow, but there is no built-in workaround in version 0.19.X. If you need to view Word documents with paging and scrolling, you may need to open them outside of RAGFlow for now.
If you have development resources, enhancing the Docx component to support better scrolling or pagination would require significant changes, such as integrating a more advanced Word document viewer or improving the HTML rendering and container styling. Currently, the system relies on basic browser scrolling within a div, which does not replicate the PDF viewer's features [docx code].
Let me know if you need more technical details or want guidance on possible implementation approaches.
To reply, just mention my name @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Your experience matches the current limitations of RAGFlow: only PDF files and images support full preview and scrolling in the inference column. Word documents (.docx) are converted to HTML for display using the "mammoth" library, but this approach does not provide the same scrolling or page navigation features as the PDF viewer. As a result, you can only zoom in/out on Word files, and scrolling through pages with the mouse wheel is not supported, which can make the text difficult to read if the document is large or formatted for paging [discussion] [issue].
This is a known limitation and not a configuration or environment issue. Improvements for non-PDF file previews, including Word documents, are planned for future versions of RAGFlow, but there is no built-in workaround in version 0.19.X. If you need to view Word documents with paging and scrolling, you may need to open them outside of RAGFlow for now.
If you have development resources, enhancing the Docx component to support better scrolling or pagination would require significant changes, such as integrating a more advanced Word document viewer or improving the HTML rendering and container styling. Currently, the system relies on basic browser scrolling within a div, which does not replicate the PDF viewer's features [docx code].
Let me know if you need more technical details or want guidance on possible implementation approaches.
To reply, just mention my name @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
![]()
Yes , please give me some advices on this problem.
Here’s technical advice and possible implementation approaches for improving Word document viewing in RAGFlow:
Currently, RAGFlow uses the "mammoth" library to convert Word (.docx) files to HTML, then renders the entire document as a single HTML block inside a scrollable container. This approach works for small documents but does not support paging, page navigation, or advanced scrolling features like the PDF viewer. All content is rendered at once, and there is no concept of logical pages or navigation UI, which makes large documents hard to read and navigate [docx code].
To enhance the Docx component for better scrolling or pagination, consider these approaches:
-
Segment HTML Output into Pages: After converting the DOCX to HTML, post-process the HTML to detect logical page breaks (for example, using section breaks, headings, or explicit page breaks if present in the DOCX). Render each page as a separate container or component. Add navigation controls (next/previous, page numbers) and allow the user to jump between pages, similar to the PDF viewer.
-
Virtualized Scrolling: For large documents, implement virtual scrolling (using libraries like react-window or react-virtualized) to only render the visible portion of the document. This improves performance and user experience, especially for very long documents.
-
Advanced Word Viewers: Consider integrating a more advanced DOCX viewer that supports paging and navigation. Some open-source options include:
-
Improve Container Styling: Adjust the CSS to better mimic page boundaries (for example, add visual page breaks, margins, or a "page" background). This can help users visually distinguish between logical pages even if full pagination is not implemented.
-
Snap-to-Page Scrolling: Implement a scroll snapping feature so that when users scroll, the view snaps to the start of each logical page, improving readability.
All of these approaches require significant changes to the current rendering logic and UI. The most direct path is to start by segmenting the HTML output into pages and adding basic navigation, then iteratively improve with virtualization and advanced UI features as needed.
There are currently no built-in workarounds or alternative libraries already integrated into RAGFlow for this purpose, so any of these enhancements would require custom development [docx code].
If you need more detailed technical guidance on any of these approaches, let me know!
To reply, just mention my name @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Thanks for your reply. I jsut try to use dify, it can show all type of files, so I hope you can improve the software.
Your experience matches the current limitations of RAGFlow: only PDF files and images support full preview and scrolling in the inference column. Word documents (.docx) are converted to HTML for display using the "mammoth" library, but this approach does not provide the same scrolling or page navigation features as the PDF viewer. As a result, you can only zoom in/out on Word files, and scrolling through pages with the mouse wheel is not supported, which can make the text difficult to read if the document is large or formatted for paging [discussion] [issue].
This is a known limitation and not a configuration or environment issue. Improvements for non-PDF file previews, including Word documents, are planned for future versions of RAGFlow, but there is no built-in workaround in version 0.19.X. If you need to view Word documents with paging and scrolling, you may need to open them outside of RAGFlow for now.
If you have development resources, enhancing the Docx component to support better scrolling or pagination would require significant changes, such as integrating a more advanced Word document viewer or improving the HTML rendering and container styling. Currently, the system relies on basic browser scrolling within a div, which does not replicate the PDF viewer's features [docx code].
Let me know if you need more technical details or want guidance on possible implementation approaches.
To reply, just mention my name @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
![]()
I encountered the same issue on version 0.19.0, but scrolling works as expected with .docx files in 0.18.0. I checked the package-lock.json under the web directory for both versions, and they both use [email protected]. Also, the code in docx/index.tsx hasn't changed between 0.18.0 and 0.19.0. So it's unclear what change in 0.19.0 caused .docx scrolling to break. Any ideas or suggestions would be appreciated.