ochre
ochre copied to clipboard
About OCR_aligned and Lost or missing text
Hi, I'm working on the OCR post-correction tasks and Ochre really helps me a lot. But I still have some questions looking forward to your reply. When using the Ochre for OCR post-correction tasks,we only have the OCR_input . So how can I get OCR_aligned from OCR_input without gs? Otherwise,how to deal with the Lost or missing text without aligned text? Thanks!
The task ochre performs is a supervised machine learning task. So, without gold standard, you can't create aligned data or train a (supervised) model.
Sorry,maybe I expressed not clearly. I mean after supervised training(for training data,we must have gold standard),how can I use this trained ochre model for actual OCR post-correction tasks? Because for actual tasks,we usually don't have gold standard and desire to get corrected text which similiar to the gold standard. On this occasion,how can I get OCR_aligned from the raw OCR_input of the actual tasks? Thanks!
The README specifies how to use a trained model to do post correction: https://github.com/KBNLresearch/ochre#ocr-post-correction
If you want to calculate performance for this text, you'd still need to have ground truth/gold standard.