docling icon indicating copy to clipboard operation
docling copied to clipboard

Standalone version of EasyOCR giving much better result than using EasyOCR in docling [ tested with Vietnamese ]

Open jonaskahn opened this issue 3 months ago • 3 comments

Bug

It's wired, when i use easyocr on huggingface or demo site, the result is much better than with docling. Do not understand what happened, but i am trying debug code and also want to know the answer from developers?

Reproduce

Here is my code

from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
pipeline_options.ocr_options.use_gpu = False
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.ocr_options.lang=["vi"]

doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options, backend=PyPdfiumDocumentBackend
        ),
        InputFormat.IMAGE: PdfFormatOption(
            pipeline_options=pipeline_options, backend=PyPdfiumDocumentBackend
        )
    }
)

test2 = doc_converter.convert("/home/jonas/Pictures/Screenshots/1.png")
print(test2.document.export_to_dict())

I/O:

  • Input Test-image
  • Output in docling: https://justpaste.it/gmk92

The sample given text (not acurrated) 'orig': 'BỌ LAO ĐỘNG THƯƠNG BINH VÀ XÃ HỌI', 'text': 'BỌ LAO ĐỘNG THƯƠNG BINH VÀ XÃ HỌI', 'level': 1},

  • EasyOcR demo: https://www.jaided.ai/easyocr The given text is more accurated Easy OCR demo more accurate

Docling version

2.7.0

Python version

3.10

EasyOCR version

1.7.2


jonaskahn avatar Nov 25 '24 11:11 jonaskahn