Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

Google Cloud Vision to PAGE-XML

Since #156 we do have a working GCV converter here based on https://github.com/PRImA-Research-Lab/prima-page-converter, so there is no actual need for https://github.com/PRImA-Research-Lab/cloud-vision-ocr-to-page. Comparing both implementations, IIUC we have: | | |...

Google Cloud Vision to PAGE-XML

> It's unfortunate that the confidences aren't serialized, like gcv2hocr does with `x_wconf` for hOCR though, but with development largely stalled, nothing much we can do except rewrite ourselves. We...

Support conversion from and to Textract JSON

Alas, the new converter is still incomplete, so > * forms, and > * tables do **not** work yet. See https://github.com/slub/textract2page/issues/2

Support conversion from and to Textract JSON

Update: tables work now, but the converter submodule needs to be updated here

adapt to Numpy and Pillow deprecations

@kba I really think this should be merged soon – we still do need the cropper here. Should I add some Dockerhub CD as well?

Long sequences run into RecursionError

@hadaev8 I'll try. Having tested several libraries available on PyPI (when searching with `align` or `edit distance` keywords) I finally reverted to the standard `difflib.SequenceMatcher` (with `isjunk=None, autojunk=False`) – although...

Unable to segment specific images

Shapely is [pinned to 1.8.x now](https://github.com/mittagessen/kraken/blob/a0c395727c011d3283b34b5f7a9ef6d85970e6d0/setup.cfg#L59). When switching to shapely 2.0.1, I do get self-intersections again.

performance with high-res images

> I am confused. Where does this actually happen? Here: https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L2007-L2008 https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L373 https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L318-L323 (So, essentially, if the column detector is confident enough, there _can_ be downsampling.)

What is the known working GPU config?

> The process is loaded into GPU memory, but the GPU is never used. I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about **low** utilisation....

What is the known working GPU config?

Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well. (I am on CUDA 11.7 though, and now it **does** work. So...