sbb_binarization issues

OCR-D processor is leaky

3

When processing a document of 1.5k pages of medium size (1-2 MP each), I am observing a slow but steady increase in RSS from 4 GB up to 14 GB...

bertsky

enable flowing from directory

1

adds the option to use a directory as input for batch processing

cneud

Model default-2021-03-09 vs sbb_binarization in ocrd_all

1

``` $ ocrd resmgr download ocrd-sbb-binarize default-2021-03-09 14:29:34.198 INFO ocrd.cli.resmgr - Downloading registered resource 'default-2021-03-09' (https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2021_03_09.zip) ``` ``` $ ocrd-sbb-binarize -P model default-2021-03-09 -I OCR-D-IMG -O TEST-OCRD-SBB-BINARIZE Traceback (most recent...

mikegerber

Update Tensorflow

1

Minimum Tensorflow version sticks with 2.4 but should be 2.12.1 to be in line with [eynollah](https://github.com/qurator-spk/eynollah/tree/main).

rettinghaus

Training (or Fine-Tuning) the Model

1

I would like to fine-tune the model towards the data that I will be feeding it. My pipeline would be to binarize the images using sbb_binarize, then manually edit them...

martholomew

OSError: SavedModel file does not exist at: saved_model_2021_03_09/assets//{saved_model.pbtxt|saved_model.pb}

3

``` ❯ sbb_binarize --model-dir saved_model_2021_03_09 actevedef_718448162.first-page/OCR-D-IMG/OCR-D-IMG_00000024.tif test.tif Traceback (most recent call last): File "/home/b-mg106/.virtualenvs/sbb_binarization_issue-47/bin/sbb_binarize", line 8, in sys.exit(main()) [...]File "/home/b-mg106/.virtualenvs/sbb_binarization_issue-47/lib/python3.9/site-packages/tensorflow/python/saved_model/loader_impl.py", line 116, in parse_saved_model raise IOError( OSError: SavedModel file does...

mikegerber

Transformer model integration

adds a hybrid CNN-Transformer model

cneud

Batch-prediction across multiple GPUs and more efficient patch-prediction

7

In order to batch-binarize thousands of images, I've rewritten the prediction script to allow us to predict around 1500-2000 images per hour on a decent machine with two GPUs. The...

apacha

quality in very low contrast regime

I have material with typewritten forms that is very challenging (to any binarization method), because the typewriter sometimes fades out, while the printing ink near it blasts in a dark...

bertsky

question

use predict_generator to better utilize GPU

3

When the model is applied in patch mode (the default), a loop over the windows is run (on CPU / in Numpy) and passed to `model.predict()` as a single image...

bertsky

enhancement

sbb_binarization
sbb_binarization copied to clipboard

Metadata

OCR-D processor is leaky

enable flowing from directory

Model default-2021-03-09 vs sbb_binarization in ocrd_all

Update Tensorflow

Training (or Fine-Tuning) the Model

OSError: SavedModel file does not exist at: saved_model_2021_03_09/assets//{saved_model.pbtxt|saved_model.pb}

Transformer model integration

Batch-prediction across multiple GPUs and more efficient patch-prediction

quality in very low contrast regime

use predict_generator to better utilize GPU

← Metadata

Owner

Metadata

sbb_binarization sbb_binarization copied to clipboard

Metadata

← Metadata

Owner

Metadata

sbb_binarization
sbb_binarization copied to clipboard