[Bug]: Issue when uploading massive documents in one shot
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch name
main
Commit ID
ragflow-0.7.0
Other environment information
No response
Actual behavior
When many documents (600 or more) is trying to upload the application suddenly stuck in all the configurations (local models or API)
Expected behavior
No response
Steps to reproduce
Upload 500 documents in one shot
Additional information
No response
Sorry, it's not designed for this kind of usage.
would be nice if you consider this kind of usage in the application.
Is the system not designed to support more than 600 documents or to upload 600 documents simultaneously? Can the project handle more than 600 documents? If not, is there a paid enterprise version that will support thousands of documents?
would be nice if you consider this kind of usage in the application.
These feature is not an easy one in my point of view(maybe easy enough for somebody^^). It is an off-line task execution involved both front end and back end.
Is the system not designed to support more than 600 documents or to upload 600 documents simultaneously? Can the project handle more than 600 documents? If not, is there a paid enterprise version that will support thousands of documents?
Of course it can handle 600 files. It can handle millions of documents, but obviously uploading these files through web page is not a feasible solution.
The system can't perform well while uploading hundereds files at once. But, after all these files have been uploaded with a small batch size, it perform well since the system architecture is flexible.
The important thing is that the software can handle millions of documents. It would be ideal if it were possible to upload the documents to a directory and then have the system automatically load them in a batch
Tested the Upload feature of the Webapp interface locally, takes an eternity to be honest (I guess because it's uploading/parsing the files one after the other?) It takes around 4-5min for a 20 page pdf. Would be great to know whether it would run faster by simply using a GPU But also whether it makes sense/is possible to parse documents in parallel
Check out the entrypoint.sh. Starting multiple task_executor.py will make it parallel. GPU will accelerate embedding speed which is realy slow on CPU.