ochre
ochre copied to clipboard
Permanent failure with VU recepie
Hi, I'm trying to run the code with VU DNC dataset. The link you provided didn't work and I downloaded it from here. Now, when I run the vudnc-preprocess.cwl as follows:
in_dir="/home/dataset/VU/FoLiACMDI"
ocr_dir_name="/home/dataset/VU/Preprocess/ocr"
gs_dir_name="/home/dataset/VU/Preprocess/gs"
aligned_dir_name="/home/dataset/VU/Preprocess/aligned"
tmp_dir="/home/ochre/vu-tmp/"
tmp_dir_out="/home/ochre/vu-tmp-out/"
cachedir="/home/ochre/cachedir/"
align_m="align_m.csv"
align_c="align_c.csv"
ocr_n="ocr_n.csv"
gs_n="gs_n.csv"
cwltool |cwl-runner ochre/cwl/vudnc-preprocess.cwl --in_dir $in_dir --ocr_dir_name $ocr_dir_name --gs_dir_name $gs_dir_name --aligned_dir_name $aligned_dir_name --ocr_n $ocr_n --gs_n $gs_n --align_m $align_m --align_c $align_c
Howerver, it is permanently failed with the following message:
[step merge-json] Cannot make job: Value for file:///home/ochre/ochre/cwl/align-texts-wf.cwl#merge-json/in_files not specified
[workflow align-texts-wf] completed permanentFail
I'd be grateful if you could help to figure out the problem. Thanks H
I think the workflow fails because of changes to nlppln. I'll try to see if I can fix that later.
Alo, I really recommend to use a different dataset than the vudnc corpus. It is just to noisy. Here is a poster that shows the most common error is a hyphenation error ('- ' that should be replaced with '', that is just too easy): https://doi.org/10.5281/zenodo.1189245
Okay, it should work again. Be careful to read the updated documentation in the README. Also, don't forget to update nlppln.
For future reference, this is the relevant commit: 9ee6d7cca72bb9bcd074e1843b12ceea122662ce