ochre
ochre copied to clipboard
Working without aligned file
Hi I’m conducting research regarding OCR corpuses, and I would like to use this project for evaluation of how differences on the training corpus effects the quality of the post-processing. But, I have OCR files and GS files without the aligned JSON file that needed. There is a way to generate it (maybe a smith waterman algorithm?) or work without it?
Thanks Omri
Thank you for your interest in ochre! Whether you need the aligned files depends on what you want to do (how you want to calculate performance). For calculating character error rate and word error rate, you don't need them. For doing word level error analysis, you need them, but if you use the workflows provided by ochre, they are generated automatically.
I am in the process of putting the workflows online and providing documentation. So, I hope you can wait a little longer.
Is your dataset publicly available? If so, I'd like to include it in my list :)
Hi, Sorry for disappearing (working on another research). I've updated my question in a separate post: https://github.com/KBNLresearch/ochre/issues/4 Thanks!