Janneke van der Zwaan comments

Results 52 comments of


                                            Janneke van der Zwaan

Where to start?

The software provided is (very) experimental. The README specifies the installation process, and I documented as much as possible (for example about what the input data should look like (see...

Additional OCR Post correction datasets

* RETAS - Text alignment software and evaluation dataset - email to obtain - http://ciir.cs.umass.edu/downloads/ocr-evaluation/

Additional OCR Post correction datasets

OCR text, but no gold standard: https://github.com/marriott-library/collections-as-data

print error - ICDAR2017_shared_task_workflows.ipynb

Thanks! The signature of wf.list_steps() changed, so, yes, you should do print(wf.list_steps()). Please note that the workflow is about preprocessing the vudnc data, this has nothing to do with the...

print error - ICDAR2017_shared_task_workflows.ipynb

Unfortunately, ochre is not (yet) fit for training good ocr post-correction models. I plan to work on it in the future, but only as a hobby project. So no promises...

Permanent failure with VU recepie

I think the workflow fails because of changes to nlppln. I'll try to see if I can fix that later. Alo, I really recommend to use a different dataset than...

Permanent failure with VU recepie

Okay, it should work again. Be careful to read the updated documentation in the README. Also, don't forget to update nlppln. For future reference, this is the relevant commit: 9ee6d7cca72bb9bcd074e1843b12ceea122662ce

All chars assumption

Actually, the chars are extracted from all text (train set, test set, and val set). Whether this is correct (fair) is open for discussion. It is probably more correct to...

Error in align_output_to_input

The problem probably has to do with the fact that edlib expects a string instead of bytes. What version of Python are you using (edlib works best under Python 3)....

Error in align_output_to_input

Probably this encoding fix was only necessary for Python 2.7 (which I still use). Thank you!