docling Improve accuracy and reduce speed

I want to know what else option I can do to improve OCR accuracy and reduce the speed? For now there is not much option for finetuning, and it took 16s for one page on google colab gpu. Anyway to reduce under 10s?

Dec 09 '24 09:12 ninedesu

This is odd, I have a speed of 1 page/s on a Google C4 CPU instance using four cores. Easyocr is a bit slower than Rapid OCR

Dec 09 '24 09:12 simjak

I tried it again and able to get 9s for Easyocr and 6s for Rapidocr. The code run fine using google colab but I encountered this error when using locally on Jupyter lab.

"--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[3], line 7 5 from docling.models.tesseract_ocr_cli_model import TesseractCliOcrOptions 6 from docling.models.tesseract_ocr_model import TesseractOcrOptions ----> 7 from docling.models.rapid_ocr_model import RapidOcrOptions 8 from docling.models.easyocr_model import EasyOcrOptions

ModuleNotFoundError: No module named 'docling.models.rapid_ocr_model'"

Dec 10 '24 02:12 ninedesu

@ninedesu Since docling provides full control over GPU accelerators now, this should no longer be of concern. Please re-open this issue if you still find issues. Thanks.

Jan 31 '25 08:01 cau-git