Christoph Auer comments

Results 170 comments of


                                            Christoph Auer

Arabic OCR is not working

This should be working as suggested by @werruww. Please reopen if you still see issues.

Docling having issue processing this font in pdf

@Arslan-Mehmood1 Some PDFs simply have garbled text layers like these, with no rescue. Some strategies that could help: 1. Check what you get when using our docling-parse-v2 or our pypdfium...

[Bee]Table Processing is variable for one document

@divekarsc Could you please outline what code you execute and how you measure to have isolated timings for TableFormer?

docling vs GROBID

@sdspieg Great to see your investigation with docling and GROBID. Let me answer to a few points of you first: > 1. Is My Setup Fair and Correct? I was...

Allowing EasyOCR to use the recog_network parameter

@itsainii That looks interesting! Could you please make a pull request with the code changes? Many thanks.

Standalone version of EasyOCR giving much better result than using EasyOCR in docling [ tested with Vietnamese ]

@jonaskahn I re-checked this, and I can see that many of the predicted text cells in EasyOCR come out with very low confidence. Can you please give a minimal code...

Standalone version of EasyOCR giving much better result than using EasyOCR in docling [ tested with Vietnamese ]

Closing this because of inactivity. Please feel free to reopen if there is further demand.

`page_range` parameter stops prematurely at page 32 when starting from page 30+

@Ouassim-Hamdani thanks for reporting and checking the copilot PR. I also don't believe it solved the problem and will check up myself now.

`page_range` parameter stops prematurely at page 32 when starting from page 30+

@Ouassim-Hamdani I think there might be a different problem here. I was checking with a large document (https://api.printnode.com/static/test/pdf/a4_500_pages.pdf) and I get the correct pages out (range 30,35), however in a...

`page_range` parameter stops prematurely at page 32 when starting from page 30+

@Ouassim-Hamdani the actual fix this needs is here: https://github.com/docling-project/docling-ibm-models/pull/141 It is hard to understand how this bug did not show effects anywhere earlier...