gcv2hocr
gcv2hocr copied to clipboard
gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
python gcv2hocr.py Capture.jpg.json > capture.hocr Traceback (most recent call last): File "gcv2hocr.py", line 146, in page = fromResponse(resp, **args.__dict__) File "gcv2hocr.py", line 99, in fromResponse word.htmlid="word_%d_%d" % (len(page.content) - 1,...
In the case of gcv2hocr.py, vertical Japanese text invisible fonts positions are bottom of the PDF document. In gcv2hocr (C version) does better output than gcv2hocr.py but it use CR...
Hello, Thanks for your code ! I have a issue on this file when I try to convert it to hocr. jpeg to json was done by gcvocr.sh Find attached...
If you need to add your language support, please check the following pages. https://github.com/filak/hOCR-to-ALTO/blob/master/codes_lookup.xml You will see a3h="***" in the code, This is a language code written in hOCR.
gcv2hocr does not support Japanese vertical text. This will be support when Google Cloud Vision OCR support it.
I found Microsoft has opened their computer vision service. https://www.microsoft.com/cognitive-services/en-us/computer-vision-api It has OCR json output but the format is different from google's one. Shall we need to make mscv2hocr ?
Konstantin Baierer made a Python port. Continue discussions. https://github.com/dinosauria123/gcv2hocr/pull/3