jbarth-ubhd

Results 41 issues of jbarth-ubhd

there is a `` left in the workflow documentation: ![image](https://user-images.githubusercontent.com/30653779/183404334-416d7a8d-94c8-41e3-96b0-586e1b56eb85.png) PS: are there any use cases for `pc-segmentation` ? As far as I remember this was the worst of all...

``` Examples: * To segment existing regions into lines (and only lines) only: `segmentation_level="line"`, `textequiv_level="line"`, `model=""` * To segment existing regions into lines (and only lines) and recognize text: `segmentation_level="line"`,...

I'll have here a small perl script generating workflow variants according to the current documentation (Steps 0..14). No `for(...)` if 1 processor is recommended @bertsky, could you have a look...

I'll get `ValueError: tile cannot extend outside image` Images (850 MB): https://digi.ub.uni-heidelberg.de/diglitData/v/testset-5-zeitschr-ca-1870.zip ``` File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.6/site-packages/ocrd/cli/process.py", line 27, in process_cli run_tasks(mets, log_level,...

bug
wontfix

Perhaps a problem only in combination with ocrd-sbb-binarize(?) ```...> ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P model /usr/local/ocrd_models/sbb/binarization/models (venv) jb@pers109:~/literatur_schoenen_wissenschaften1780a> ocrd-anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP 16:04:18.388 INFO OcrdAnybaseocrCropper - INPUT FILE...

Would it be possible to surpass the ~32 characters limit of agrep by using 64 bit `unsigned long` instead of `unsigned`? Tried a bit with using `unsigned long` and doubling...

### Current Behavior Did run a 2 column german text (portrait + landscape) at (ImageMagick-)angles 0°, 90°, 180°, 270° each ± 3°, partially with ±.1° jitter. PDF files (converted to...

layout analysis

I've created *.exp0.gt.txt as a base for manual ground truth creation using [Shreeshrii's shell script](https://github.com/tesseract-ocr/tesstrain/issues/7#issuecomment-419714852) and the files contain a space before and after the text (no newlines etc). Example:...

The word list in eng.traineddata contains relatively (in comparison with fra, deu, ita, spa) many ambigious words (checked with https://gist.github.com/jbarth-ubhd/8d5ceb4035bf2d89700117a311209f20 ): AMBIGIOUS (EXCERPT): Abstract;In addRole Alberta.ca AngMarTV AppSight aXe BarCap...

**Describe the bug** XQuery: ```xquery let $band := doc("/db/resources/digiZeitung/heidelberger_tageblatt1884.xml") let $seq := ("dmd", "dmd00001") return (# exist:force-index-use #) { $band//*[@ID=$seq] } ``` with collection.xconf: `.........` says »exerr:ERROR XQDYxxxx: Can not...