Robert Sachunsky
Robert Sachunsky
Since #156 we do have a working GCV converter here based on https://github.com/PRImA-Research-Lab/prima-page-converter, so there is no actual need for https://github.com/PRImA-Research-Lab/cloud-vision-ocr-to-page. Comparing both implementations, IIUC we have: | | |...
> It's unfortunate that the confidences aren't serialized, like gcv2hocr does with `x_wconf` for hOCR though, but with development largely stalled, nothing much we can do except rewrite ourselves. We...
Alas, the new converter is still incomplete, so > * forms, and > * tables do **not** work yet. See https://github.com/slub/textract2page/issues/2
Update: tables work now, but the converter submodule needs to be updated here
@kba I really think this should be merged soon – we still do need the cropper here. Should I add some Dockerhub CD as well?
@hadaev8 I'll try. Having tested several libraries available on PyPI (when searching with `align` or `edit distance` keywords) I finally reverted to the standard `difflib.SequenceMatcher` (with `isjunk=None, autojunk=False`) – although...
Shapely is [pinned to 1.8.x now](https://github.com/mittagessen/kraken/blob/a0c395727c011d3283b34b5f7a9ef6d85970e6d0/setup.cfg#L59). When switching to shapely 2.0.1, I do get self-intersections again.
> I am confused. Where does this actually happen? Here: https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L2007-L2008 https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L373 https://github.com/qurator-spk/eynollah/blob/13bc2378d952f1ef7637480304d5383a45af789d/qurator/eynollah/eynollah.py#L318-L323 (So, essentially, if the column detector is confident enough, there _can_ be downsampling.)
> The process is loaded into GPU memory, but the GPU is never used. I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about **low** utilisation....
Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well. (I am on CUDA 11.7 though, and now it **does** work. So...