Konstantin Baierer comments

Results 277 comments of


                                            Konstantin Baierer

GCV to HOCR or PAGE conversion not working

To convert PDF to Google Cloud Vision JSON,, you need to use Google Cloud Vision which is a commercial cloud software we neither support nor endorse. Once you have that...

GCV to HOCR or PAGE conversion not working

You could also convert to PAGE via hOCR and try https://github.com/PRImA-Research-Lab/prima-page-to-pdf

GCV to HOCR or PAGE conversion not working

Then it's best to ask @dinosauria123 (not sure whether they're subscribed to issues here but they should see the mention). The code is at https://github.com/dinosauria123/gcv2hocr

Support new features of ALTO version 3.0, 3.1

Eventually, also [recording of glyph positions](https://github.com/altoxml/schema/issues/26) but not sure if&when. > Both FineReader-XML and hOCR already offer character level encoding. It should be possible to transform either of these two...

Microsoft Computer Vision API

Maybe @dinosauria123 has a take on this, since he's been developing a converter from Google's Cloud Vision API responses at https://github.com/dinosauria123/gcv2hocr

Improve privacy of web interface by reducing third party contents

The method du jour is to structure code as a module, rely on Node.JS for managing included libraries and bundle for distribution with webpack/browserify/parcel.

Add update mechanism

> Thus, I think we need a proper update mechanism in the Makefile. Easiest solution: clone the repository depth-1 to a tempdirectory, uninstall, reinstall. It's less efficient, but the cleanest.

"ocr-transform page alto ... ...": loosing text

> Perhaps duplicate of PRImA-Research-Lab/prima-page-converter#13 Indeed, PAGE-ALTO conversion requires word segmentation. @maxnth Can you think of any sensible workaround?

"ocr-transform page alto ... ...": loosing text

Great but maybe we can integrate pseudo-word creation on-the-fly directly into the converter, with a cmdline flag.

"ocr-transform page alto ... ...": loosing text

> seems not to be fixed in v0.4.0. ocrd_calamari is at 1.0.0 and calamari at 1.0.5 but word-level PAGE output is indeed not implemented yet in calamari AFAICT