dinosauria123

Results 29 comments of dinosauria123

Or this one ? https://hub.docker.com/r/ubma/ocr-fileformat/

Thank you for your report. I will check json output but patches may be delay because now I am busy my job.

Do you want to convert images to hocr ? You may use Tesseract OCR. https://github.com/tesseract-ocr/tesseract

I have checked gcv2hocr but output seems to be fine. Did you use gcvocr.sh to get json output ? Please attach your json output to your comment.

Check here. https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output

I think you have to use multiple tools. for example, hocr to pdf is possible hocr-tools. https://github.com/tmbdev/hocr-tools#hocr-pdf pdf may have many tools to convert to other format...

Do you know Alto ? https://en.wikipedia.org/wiki/ALTO_(XML) If you want to deal with OCR format, Alto is better than hocr. https://github.com/altoxml/documentation/wiki/Software

I never used this, but I think it is what you want ... https://github.com/tabulapdf/tabula-extractor http://tabula.technology/ I think this topic is not related to gcv2hocr, may I close this issue ?

Sorry, The Correct debug message is below : rst:0x8 (TG1WDT_SYS_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0030,len:1184 load:0x40078000,len:13260 load:0x40080400,len:3028 entry 0x400805e4