docling
docling copied to clipboard
Standalone version of EasyOCR giving much better result than using EasyOCR in docling [ tested with Vietnamese ]
Bug
It's wired, when i use easyocr on huggingface or demo site, the result is much better than with docling. Do not understand what happened, but i am trying debug code and also want to know the answer from developers?
Reproduce
Here is my code
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
pipeline_options.ocr_options.use_gpu = False
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.ocr_options.lang=["vi"]
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options, backend=PyPdfiumDocumentBackend
),
InputFormat.IMAGE: PdfFormatOption(
pipeline_options=pipeline_options, backend=PyPdfiumDocumentBackend
)
}
)
test2 = doc_converter.convert("/home/jonas/Pictures/Screenshots/1.png")
print(test2.document.export_to_dict())
I/O:
- Input
- Output in docling: https://justpaste.it/gmk92
The sample given text (not acurrated) 'orig': 'BỌ LAO ĐỘNG THƯƠNG BINH VÀ XÃ HỌI', 'text': 'BỌ LAO ĐỘNG THƯƠNG BINH VÀ XÃ HỌI', 'level': 1},
- EasyOcR demo: https://www.jaided.ai/easyocr
The given text is more accurated
Docling version
2.7.0
Python version
3.10
EasyOCR version
1.7.2