ocrd_all icon indicating copy to clipboard operation
ocrd_all copied to clipboard

usage of ocrd-cis-postcorrect in ocrd_all

Open cneud opened this issue 3 years ago • 1 comments

Running ocrd_cis/ocrd-cis-postcorrect requires additional components that afaict are currently not installed with ocrd_all.

See https://github.com/cisocrgroup/ocrd_cis/issues/51#issuecomment-667015061

In order to run our post correction, both our profiler and an according language backend has to be installed on the system. The configuration variable profilerPath (which should be named profilerCommand more appropriately) must point to the profiler executable and the profilerConfig variable must point to the according language configuration file. There is a manual for the profiler and the language backend in our repositories.

  1. Profiler / Installation
  2. Language resources / Installation

The relevant section of the Workflow Guide suggests a workaround

If you don't want to use a profiler, you can set the value for "profilerConfig" to "ignored". In this case, your profiler.bash should look like this: #!/bin/bash cat > /dev/null echo '{}' you need to pass your local path to the model on your hard drive as parameter value for this processor to work!

However, in light of the above comment, how useful is this? Will it still perform corrections and to what extent will the rate of corrections drop without the Profiler?

Should the missing components be included in ocrd_all? Or otherwise could the documentation perhaps be extended with additional documentation on the effect of ocrd-cis-postcorrect with/without the Profiler?

cneud avatar Aug 20 '20 22:08 cneud

However, in light of the above comment, how useful is this? Will it still perform corrections and to what extent will the rate of corrections drop without the Profiler?

AFAICT not very useful. IIRC there still is a re-ranking component running after the profiler that collects and weighs all candidates, but if no profiler runs, then only the pre-existing OCR hypotheses are taken into account (i.e. no edits and no ranking based on maximum entropy adaptation). So if you are running single-OCR and without alternatives, you would get no change – IIUC.

But these are all very good questions that really only @finkf can answer.

bertsky avatar Oct 08 '20 20:10 bertsky