core
core copied to clipboard
Convert ocrd process syntax to Nextflow script
The idea behind this story is to provide a basic function that can convert OCR-D processor commands provided as plain text into Nextflow scripts.
For example, consider a file with the following content as a workflow:
ocrd-cis-ocropy-binarize -I "OCR-D-IMG" -O "OCR-D-BIN"
ocrd-anybaseocr-crop -I "OCR-D-BIN" -O "OCR-D-CROP"
ocrd-skimage-binarize -I "OCR-D-CROP" -O "OCR-D-BIN2" -P method li
ocrd-skimage-denoise -I "OCR-D-BIN2" -O "OCR-D-BIN-DENOISE" -P level-of-operation page
ocrd-tesserocr-deskew -I "OCR-D-BIN-DENOISE" -O "OCR-D-BIN-DENOISE-DESKEW" -P operation_level page
ocrd-cis-ocropy-segment -I "OCR-D-BIN-DENOISE-DESKEW" -O "OCR-D-SEG" -P level-of-operation page
ocrd-cis-ocropy-dewarp -I "OCR-D-SEG" -O "OCR-D-SEG-LINE-RESEG-DEWARP"
ocrd-calamari-recognize -I "OCR-D-SEG-LINE-RESEG-DEWARP" -O "OCR-D-OC" -P checkpoint_dir qurator-gt4histocr-1.0
All calls to these specific processors will be translated to an equivalent Nextflow script. Since I am still working on #208 the content of the workflow file may differ from the example I gave.
we decided to only accept nextflow scripts so a converter is not needed
we decided to only accept nextflow scripts so a converter is not needed
I am surprised. Would you mind detailling the logic behind this decision?
If anything, shouldn't there be some backwards compatibility at least?
@bertsky
Check the discussion here: https://github.com/OCR-D/spec/pull/208
Check the discussion here: OCR-D/spec#208
Thanks, but I cannot see any arguments for not providing a converter tool from the existing ocrd process workflow syntax to the Nextflow syntax there. IMO this would be very relevant, esp. in the beginning.
I was not aware of this issue. But yes, I agree with Robert. A converter would be nice to have, at least for backwards compatibility.
we decided to only accept nextflow scripts so a converter is not needed
This was about an intermediate format between the purely sequential ocrd process calls and the NF scripts. For purely sequential workflows, a conversion between ocrd process (which is in essence just a list of command line calls anyway) to/from NF should be possible and where possible we will provide both (e.g. in https://ocr-d.de/en/workflows).
The repo for the code is here https://github.com/MehmedGIT/OtoN_Converter
waiting for discussion when Mehmet is back from vacation and review by Triet, then repo will be moved to OCR-D.
The OtoN converter is finished. Currently, there are no know issues or bugs.
Source: https://github.com/MehmedGIT/OtoN_Converter
required review from @kba