gcv2hocr issues

gcv2ocr2.py output - Correct bbox for individual words, but "ocr_lines" completely busted

1

I had to manually specify the page_width and page_height to match my PDF images to get the words to align. I am sure the words are perfectly aligned by manually...

hengyu95

gcv2hocr2.py - Support using vertices instead of normalizedVertices for bbox

5

Currently in `gcv2hocr2.py`, the coordinate of the bounding box for `block`, `paragraph`, and `word` is created from their respective `boundingBox.normalizedVertices`: https://github.com/dinosauria123/gcv2hocr/blob/40adc1026fc10a0fbe746a0a26329d0e9bcd527a/gcv2hocr2.py#L123 https://github.com/dinosauria123/gcv2hocr/blob/40adc1026fc10a0fbe746a0a26329d0e9bcd527a/gcv2hocr2.py#L129 https://github.com/dinosauria123/gcv2hocr/blob/40adc1026fc10a0fbe746a0a26329d0e9bcd527a/gcv2hocr2.py#L135 Is it possible to create a new...

SoloSynth1

Adding Multi-Threading Support for gcvocr.sh and Fix Here

1

I have gone ahead and updated the script gcvocr.sh in order to accommodate for multi-threading. You can go ahead and view the source code here. https://gist.github.com/UBISOFT-1/4017d641c329159f8de3d203efc919e1 I am adding the...

UBISOFT-1

gcv2hocr doesn't rectify negative coordinates in GCV API response

According to the hOCR standard (Latest is v1.2 as of March 2021), the bbox property specifies `uint` to be used. That means all values must be unsigned. ([http://kba.cloud/hocr-spec/1.2/#propdef-bbox](http://kba.cloud/hocr-spec/1.2/#propdef-bbox)) However, the...

SoloSynth1

gcv2ocr.py does not convert json

6

I'm working with the attached JSON file from GCV but when I run the gcv2ocr.py, the hocr only has metadata and lacks content. [osh-sample-1911a-0001.json.zip](https://github.com/dinosauria123/gcv2hocr/files/4689613/osh-sample-1911a-0001.json.zip)

sarepal

Support Document text detection

1

When posting OCR request, we can choose two type of response. A TEXT_DETECTION response includes the detected phrase, its bounding box, and individual words and their bounding boxes: A DOCUMENT_TEXT_DETECTION...

dinosauria123

GCV to HOCR or PAGE conversion not working

2

Hi @dinosauria123! This is the issue I posted on ocr-fileformat: https://github.com/UB-Mannheim/ocr-fileformat/issues/121 As per your request I'm opening the issue here, copying the text: I have the JSON output of google...

OmriPi

gcv2hocr does not support scanned pdf

4

1. save sample/jpn/jptest2.jpg as jptest2.pdf, 2. uploading to google vision (storage), and 3. generate output.json with `gcloud ml vision detect-text-pdf gs://my_bucket/input_file gs://my_bucket/out_put_prefix`, according to [text_detection_pdf](https://cloud.google.com/vision/docs/pdf#vision_text_detection_pdf_gcs-gcloud) 4. download output.json 4. gcv2hocr...

ctrngk

Converting JSON to HOCR (Segmentation Fault)

7

First off, thanks for an awesome piece of software. For the most part, it works great! For some reason, after converting many thousands of pages, I've come across this error...

pauf

Could not convert json output

6

I tried to convert the json output on Google's page using gcv2hocr.py: https://cloud.google.com/vision/docs/ocr Traceback (most recent call last): File "gcv2hocr2.py", line 146, in page = fromResponse(resp, **args.__dict__) File "gcv2hocr.py", line...

heroturtle

gcv2hocr
gcv2hocr copied to clipboard

Metadata

gcv2ocr2.py output - Correct bbox for individual words, but "ocr_lines" completely busted

gcv2hocr2.py - Support using vertices instead of normalizedVertices for bbox

Adding Multi-Threading Support for gcvocr.sh and Fix Here

gcv2hocr doesn't rectify negative coordinates in GCV API response

gcv2ocr.py does not convert json

Support Document text detection

GCV to HOCR or PAGE conversion not working

gcv2hocr does not support scanned pdf

Converting JSON to HOCR (Segmentation Fault)

Could not convert json output

← Metadata

Owner

Metadata

gcv2hocr gcv2hocr copied to clipboard

Metadata

← Metadata

Owner

Metadata

gcv2hocr
gcv2hocr copied to clipboard