kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

feat: integrate got-ocr2.0 as image reader

Open phv2312 opened this issue 1 year ago • 6 comments

Description

  • Integrate the got-ocr2.0 OCR as image reader
  • New extension manager for easily switch between different supported loaders
  • Also, thanks @cin-jimmy for his suggestion on github stale (issue)

Type of change

  • [x] New features (non-breaking change).
  • [ ] Bug fix (non-breaking change).
  • [ ] Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • [x] I have performed a self-review of my code.
  • [ ] I have added thorough tests if it is a core feature.
  • [ ] There is a reference to the original bug report and related work.
  • [ ] I have commented on my code, particularly in hard-to-understand areas.
  • [ ] The feature is well documented.

phv2312 avatar Oct 02 '24 15:10 phv2312

@phv2312, can you add a docker-compose file (allow choose the docker image for OCR service)? I think it will help people test more easily.

cin-niko avatar Oct 04 '24 08:10 cin-niko

Hi @taprosoft @cin-niko. Sorry for no update for long time. Can you help to review this PR again

phv2312 avatar Oct 26 '24 04:10 phv2312

Hi @cin-niko and @taprosoft . I have updated according to niko's comments and rebased from the latest master already. Can you help to check this PR again ?

phv2312 avatar Dec 15 '24 11:12 phv2312

@phv2312 Overall is good. But it seems that setting the loader for extensions feature doesn't work. For example:

  • Set pdf loader in Settings -> Retrieval Settings -> File loader: Work
  • Set pdf loader in Settings -> Loader settings -> Loader .pdf: Doesn't work

cin-niko avatar Dec 16 '24 05:12 cin-niko

@phv2312 sorry for the late comment. Overall the logic is fine but the current settings UI is a bit cluttered. I will push a small change to improve this prior to merging.

taprosoft avatar Dec 16 '24 06:12 taprosoft

Need OCR for PDFs working as well, when I upload a PDF which is created by a scanner that scans in pages, it puts the scanned pages into a multi-page PDF. Need to be able to upload the PDF, which then it does an OCR to extract the text and use that extracted text for reasoning.

heapsoftware avatar Jun 04 '25 17:06 heapsoftware