ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: Issue when uploading massive documents in one shot

Open rrodriguezlo opened this issue 1 year ago • 7 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

ragflow-0.7.0

Other environment information

No response

Actual behavior

When many documents (600 or more) is trying to upload the application suddenly stuck in all the configurations (local models or API)

Expected behavior

No response

Steps to reproduce

Upload 500 documents in one shot

Additional information

No response

rrodriguezlo avatar Jun 28 '24 09:06 rrodriguezlo

Sorry, it's not designed for this kind of usage.

KevinHuSh avatar Jul 01 '24 01:07 KevinHuSh

would be nice if you consider this kind of usage in the application.

rrodriguezlo avatar Jul 01 '24 05:07 rrodriguezlo

Is the system not designed to support more than 600 documents or to upload 600 documents simultaneously? Can the project handle more than 600 documents? If not, is there a paid enterprise version that will support thousands of documents?

octoberweb69 avatar Jul 01 '24 21:07 octoberweb69

would be nice if you consider this kind of usage in the application.

These feature is not an easy one in my point of view(maybe easy enough for somebody^^). It is an off-line task execution involved both front end and back end.

KevinHuSh avatar Jul 02 '24 01:07 KevinHuSh

Is the system not designed to support more than 600 documents or to upload 600 documents simultaneously? Can the project handle more than 600 documents? If not, is there a paid enterprise version that will support thousands of documents?

Of course it can handle 600 files. It can handle millions of documents, but obviously uploading these files through web page is not a feasible solution.

The system can't perform well while uploading hundereds files at once. But, after all these files have been uploaded with a small batch size, it perform well since the system architecture is flexible.

KevinHuSh avatar Jul 02 '24 01:07 KevinHuSh

The important thing is that the software can handle millions of documents. It would be ideal if it were possible to upload the documents to a directory and then have the system automatically load them in a batch

octoberweb69 avatar Jul 02 '24 05:07 octoberweb69

Tested the Upload feature of the Webapp interface locally, takes an eternity to be honest (I guess because it's uploading/parsing the files one after the other?) It takes around 4-5min for a 20 page pdf. Would be great to know whether it would run faster by simply using a GPU But also whether it makes sense/is possible to parse documents in parallel

SaidKhudoyan avatar Jul 11 '24 10:07 SaidKhudoyan

Check out the entrypoint.sh. Starting multiple task_executor.py will make it parallel. GPU will accelerate embedding speed which is realy slow on CPU.

KevinHuSh avatar Jul 17 '24 11:07 KevinHuSh