Christopher Pereira

Results 71 comments of Christopher Pereira

It's not the SDK, but Rosetta. I fixed this in upstream master some months ago.

Test if the YUV callback receives additional 2048 bytes on the Mini 2: https://github.com/RosettaDrone/rosettadrone/issues/151

Predictor initiated with: `model = ocr_predictor(pretrained=True, assume_straight_pages=False)` But the probelm persists on page 1: ``` Notario y Conservador de Bienes Raices Licanten Vilma Beatriz Navarro

> @kripper Have you already tried to disable block and/or line resolving ? https://mindee.github.io/doctr/using_doctr/using_models.html#two-stage-approaches > > `resolve_blocks=False` `resolve_lines=False` It's now mixing blocks multiple times per line. What about taking a...

> Do you have a direct reference to the code or algorithm ? No, but I will research tomorrow.

Have you tried existing tools to convert doctr's HOCR output to text? There are many. Tesseract probably is also using some of them.

> sometimes the models predict lines in the wrong block The synthesized page looks fine. Identifying lines shouldn't be that difficult IMO. ![out](https://github.com/mindee/doctr/assets/1479804/16c55991-3e37-4396-87e6-6e52894ed5ae)

Have you checked Aider? https://aider.chat/

GPT Pilot is also very interesting.

> But frankly speaking, this is a relatively complex scientific research problem. Please elaborate what are the current main issues you are struggling with and I will provide you with...