bookworm icon indicating copy to clipboard operation
bookworm copied to clipboard

Feature Request: Add Paddle OCR recognition to Bookworm

Open cary-rowen opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe.

Whether it is Windows10OCR or Tesseract OCR, the recognition effect in Simplified Chinese environment is not ideal.

Describe the solution you'd like

There is a open source project called Paddle OCR. This project supports multiple languages. In the Chinese and English scenarios I tested, its recognition rate exceeded Windows OCR and Tesseract OCR. It would be nice if Bookworm could add a Paddle recognition engine

Describe alternatives you've considered

None

Additional context

Screen readers in China have distributed it as part of their screen readers, and so far everything seems to be working fine. Paddle OCR repo: https://github.com/PaddlePaddle/PaddleOCR/

cary-rowen avatar Mar 13 '22 03:03 cary-rowen

Hello @cary-rowen I investigated adding this OCR engine to Bookworm. The main road blocker here is that adding this will increase the bundle size significantly. What does those screen readers do to embed this engine? How much their bundle size increased since adding this engine? Perhaps there is another, simpler way to embed this engine that I'm not aware of.

Best Musharraf

mush42 avatar Mar 13 '22 05:03 mush42

Hi @mush42

As far as I know, those screen reader bundles add less than 20 MB, Paddle OCR has a different recognition model, we might consider adding a light built-in model to Bookworm. I will further find more useful news for you. Thanks

cary-rowen avatar Mar 13 '22 06:03 cary-rowen

Hello @cary-rowen

I've been studying paddle OCR and the ways it can be added to Bookworm without bringing in a huge number of additional dependencies.

The major issue is that most of the development documentation is written in Chinese, but through Google Translate, I was able to understand the basics of the process.

Paddle OCR can be embedded using one of the two following ways:

  1. Use paddle OCr C++ interface to create a python binding: The benefit of this is speed, because all of the processing will happen in C++. Also, the python bindings will be reusable, for instance, we can create an NVDA add-on based on this. The major downside to this is that it requires a lot of time for initial development and testing.

  2. Use ONNXRuntime:

Paddle has official support for ONNX runtime, but I couldn't find any official confirmation from paddle developers as to whether all of the models support ONNX runtime.

ONNX is fairly fast, but a major downside of this is that the majority of OCR processing happens in python, which is slower of course.

The above technical details are notes to my future self, and for other interested parties.

Best Musharraf

mush42 avatar Mar 22 '22 19:03 mush42

Hi @mush42 , great to hear you're working on this. I will try to ask the authors of Paddle OCR about whether all models support the ONNX runtime. btw, I'm very happy with the advantages of the first option, which means that there may be more projects using Paddle OCR especially the NVDA add-on. All in all, good luck to you.

thanks

cary-rowen avatar Mar 23 '22 02:03 cary-rowen