ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

Feat: add OCR's muti-gpus and parallel processing support

Open debugdoctor opened this issue 9 months ago • 5 comments

What problem does this PR solve?

Add OCR's muti-gpus and parallel processing support

Type of change

  • [x] New Feature (non-breaking change which adds functionality)

debugdoctor avatar Mar 06 '25 04:03 debugdoctor

Looks generally good. However deepdoc OCR is not an obvious bottleneck. If it is, I would prefer running them with trio threadpool. Refactoring chunk (rag/app/naive.py) function to async is not easy but amazing.

Hello, Thanks for your suggestions, I‘d like have a try.

debugdoctor avatar Mar 06 '25 09:03 debugdoctor

@yuzhichang I've already switched the threadpool to Trio's. Please review it again. (The screenshot shows testing with non-continuous GPU settings.) setting nvidia-smi

debugdoctor avatar Mar 07 '25 04:03 debugdoctor

@debugdoctor Great job! The only issue is CI failure. Could you fix it?

yuzhichang avatar Mar 07 '25 11:03 yuzhichang

@debugdoctor Great job! The only issue is CI failure. Could you fix it?

Thanks, let me fix it.^_^

debugdoctor avatar Mar 07 '25 11:03 debugdoctor

@yuzhichang Checks passed, plz review it again.

debugdoctor avatar Mar 08 '25 01:03 debugdoctor

@debugdoctor This PR passed CI and was merged. But I found PDF parsing is broken due to this PR. So I have reverted the merging. Please resolve my new comments, test and open another PR. Thanks!

Thanks for your review, I will fully test it before commit.

debugdoctor avatar Mar 11 '25 13:03 debugdoctor