[Question]: Performance Comparison: Single File Upload vs. Multiple File Uploads in Ragflow
Describe your problem
Hi infiniflow Team,
I have a question regarding the performance and results when using Ragflow for file parsing and processing. Specifically, I am curious about the difference in outcomes when uploading files in two different ways: Single File Upload: Consolidating all files (e.g., 100,000 TXT files) into a single TXT file (without altering the original data) and uploading it for parsing. Multiple File Uploads: Uploading and parsing each of the 100,000 TXT files individually. My main concerns are: Will the final results (e.g., parsed data, model outputs) be the same in both scenarios? Are there any differences in model performance (e.g., speed, accuracy, resource usage) between the two approaches? Are there any best practices or recommendations for handling large volumes of files in Ragflow? I would appreciate any insights or guidance on this matter. Thank you for your time and support!
Best regards.
Multiple file uploads will accelerates the parsing procedure if you start multiple task executors.
For chunking methods like Book and Law, spliting a complete text file into pieces will lose some context information.
Thank you so much for your prompt response and for maintaining this wonderful repository. I truly appreciate your time and effort in addressing my concerns. Currently, I have a large number of individual text documents that I need to upload as konwledge database. Each file is a separate document. I'm considering whether to upload them as separate files or merge them into a single file. Would combining all the text content into one file affect the model's performance ? I want to ensure that I'm providing the data in the most effective format for the model's learning. Could you please advise on the recommended approach for handling multiple text documents? Thank you again for your assistance and for sharing your expertise。
Not nessasary to merge files. And make the title of files meaningful.