ochre icon indicating copy to clipboard operation
ochre copied to clipboard

Working without aligned file

Open omrishsu opened this issue 7 years ago • 2 comments

Hi I’m conducting research regarding OCR corpuses, and I would like to use this project for evaluation of how differences on the training corpus effects the quality of the post-processing. But, I have OCR files and GS files without the aligned JSON file that needed. There is a way to generate it (maybe a smith waterman algorithm?) or work without it?

Thanks Omri

omrishsu avatar Jan 14 '18 15:01 omrishsu

Thank you for your interest in ochre! Whether you need the aligned files depends on what you want to do (how you want to calculate performance). For calculating character error rate and word error rate, you don't need them. For doing word level error analysis, you need them, but if you use the workflows provided by ochre, they are generated automatically.

I am in the process of putting the workflows online and providing documentation. So, I hope you can wait a little longer.

Is your dataset publicly available? If so, I'd like to include it in my list :)

jvdzwaan avatar Jan 16 '18 21:01 jvdzwaan

Hi, Sorry for disappearing (working on another research). I've updated my question in a separate post: https://github.com/KBNLresearch/ochre/issues/4 Thanks!

omrishsu avatar Feb 24 '18 08:02 omrishsu