Konstantin Baierer

Results 277 comments of Konstantin Baierer

To convert PDF to Google Cloud Vision JSON,, you need to use Google Cloud Vision which is a commercial cloud software we neither support nor endorse. Once you have that...

You could also convert to PAGE via hOCR and try https://github.com/PRImA-Research-Lab/prima-page-to-pdf

Then it's best to ask @dinosauria123 (not sure whether they're subscribed to issues here but they should see the mention). The code is at https://github.com/dinosauria123/gcv2hocr

Eventually, also [recording of glyph positions](https://github.com/altoxml/schema/issues/26) but not sure if&when. > Both FineReader-XML and hOCR already offer character level encoding. It should be possible to transform either of these two...

Maybe @dinosauria123 has a take on this, since he's been developing a converter from Google's Cloud Vision API responses at https://github.com/dinosauria123/gcv2hocr

The method du jour is to structure code as a module, rely on Node.JS for managing included libraries and bundle for distribution with webpack/browserify/parcel.

> Thus, I think we need a proper update mechanism in the Makefile. Easiest solution: clone the repository depth-1 to a tempdirectory, uninstall, reinstall. It's less efficient, but the cleanest.

> Perhaps duplicate of PRImA-Research-Lab/prima-page-converter#13 Indeed, PAGE-ALTO conversion requires word segmentation. @maxnth Can you think of any sensible workaround?

Great but maybe we can integrate pseudo-word creation on-the-fly directly into the converter, with a cmdline flag.

> seems not to be fixed in v0.4.0. ocrd_calamari is at 1.0.0 and calamari at 1.0.5 but word-level PAGE output is indeed not implemented yet in calamari AFAICT