gcv2hocr icon indicating copy to clipboard operation
gcv2hocr copied to clipboard

Issue when converting to hocr

Open ebriquet64 opened this issue 7 years ago • 3 comments

Hello,

Thanks for your code !

I have a issue on this file when I try to convert it to hocr.

jpeg to json was done by gcvocr.sh

Find attached example, thanks for your help

MAXX.json.zip

ebriquet64 avatar Oct 24 '17 14:10 ebriquet64

Thank you for your comment.

This is json output bug by google.

You can see each word has eight numbers.

Some of the word missing x or y number.

Please add the number manually then convert it by gcv2hocr until segfault is vanished.

dinosauria123 avatar Oct 24 '17 15:10 dinosauria123

Thanks for your very quick response !

I try to find a way to have a good solution; I have tested Microsoft API but JSON format is not same as Goole: Any way to use it ?

Thanks for your help

ebriquet64 avatar Oct 25 '17 09:10 ebriquet64

Thank you for comment.

hOCR file need word (or letter) and its coordinate.

You have to do extract word and coordinate from MS json output, then rearrange them hOCR format.

My C cord not good but it help to understand what going on.

If you prefer python, the python port is also included in gcv2hocr sources.

dinosauria123 avatar Oct 25 '17 09:10 dinosauria123