core icon indicating copy to clipboard operation
core copied to clipboard

Convert ocrd process syntax to Nextflow script

Open MehmedGIT opened this issue 3 years ago • 10 comments
trafficstars

The idea behind this story is to provide a basic function that can convert OCR-D processor commands provided as plain text into Nextflow scripts.

For example, consider a file with the following content as a workflow:

ocrd-cis-ocropy-binarize -I "OCR-D-IMG" -O "OCR-D-BIN"
ocrd-anybaseocr-crop -I "OCR-D-BIN" -O "OCR-D-CROP"
ocrd-skimage-binarize -I "OCR-D-CROP" -O "OCR-D-BIN2" -P method li
ocrd-skimage-denoise -I "OCR-D-BIN2" -O "OCR-D-BIN-DENOISE" -P level-of-operation page
ocrd-tesserocr-deskew -I "OCR-D-BIN-DENOISE" -O "OCR-D-BIN-DENOISE-DESKEW" -P operation_level page
ocrd-cis-ocropy-segment -I "OCR-D-BIN-DENOISE-DESKEW" -O "OCR-D-SEG" -P level-of-operation page
ocrd-cis-ocropy-dewarp -I "OCR-D-SEG" -O "OCR-D-SEG-LINE-RESEG-DEWARP"
ocrd-calamari-recognize -I "OCR-D-SEG-LINE-RESEG-DEWARP" -O "OCR-D-OC" -P checkpoint_dir qurator-gt4histocr-1.0

All calls to these specific processors will be translated to an equivalent Nextflow script. Since I am still working on #208 the content of the workflow file may differ from the example I gave.

MehmedGIT avatar Jul 01 '22 10:07 MehmedGIT

we decided to only accept nextflow scripts so a converter is not needed

lena-hinrichsen avatar Jul 05 '22 08:07 lena-hinrichsen

we decided to only accept nextflow scripts so a converter is not needed

I am surprised. Would you mind detailling the logic behind this decision?

If anything, shouldn't there be some backwards compatibility at least?

bertsky avatar Jul 05 '22 08:07 bertsky

@bertsky

Check the discussion here: https://github.com/OCR-D/spec/pull/208

MehmedGIT avatar Jul 05 '22 10:07 MehmedGIT

Check the discussion here: OCR-D/spec#208

Thanks, but I cannot see any arguments for not providing a converter tool from the existing ocrd process workflow syntax to the Nextflow syntax there. IMO this would be very relevant, esp. in the beginning.

bertsky avatar Jul 05 '22 11:07 bertsky

I was not aware of this issue. But yes, I agree with Robert. A converter would be nice to have, at least for backwards compatibility.

tdoan2010 avatar Jul 05 '22 12:07 tdoan2010

we decided to only accept nextflow scripts so a converter is not needed

This was about an intermediate format between the purely sequential ocrd process calls and the NF scripts. For purely sequential workflows, a conversion between ocrd process (which is in essence just a list of command line calls anyway) to/from NF should be possible and where possible we will provide both (e.g. in https://ocr-d.de/en/workflows).

kba avatar Jul 06 '22 15:07 kba

The repo for the code is here https://github.com/MehmedGIT/OtoN_Converter

tdoan2010 avatar Jul 18 '22 08:07 tdoan2010

waiting for discussion when Mehmet is back from vacation and review by Triet, then repo will be moved to OCR-D.

krvoigt avatar Jul 18 '22 08:07 krvoigt

The OtoN converter is finished. Currently, there are no know issues or bugs.

Source: https://github.com/MehmedGIT/OtoN_Converter

MehmedGIT avatar Jul 29 '22 15:07 MehmedGIT

required review from @kba

krvoigt avatar Aug 01 '22 08:08 krvoigt