ochre icon indicating copy to clipboard operation
ochre copied to clipboard

Permanent failure with VU recepie

Open hadiasheri opened this issue 6 years ago • 2 comments

Hi, I'm trying to run the code with VU DNC dataset. The link you provided didn't work and I downloaded it from here. Now, when I run the vudnc-preprocess.cwl as follows:

in_dir="/home/dataset/VU/FoLiACMDI" ocr_dir_name="/home/dataset/VU/Preprocess/ocr" gs_dir_name="/home/dataset/VU/Preprocess/gs" aligned_dir_name="/home/dataset/VU/Preprocess/aligned" tmp_dir="/home/ochre/vu-tmp/" tmp_dir_out="/home/ochre/vu-tmp-out/" cachedir="/home/ochre/cachedir/" align_m="align_m.csv" align_c="align_c.csv" ocr_n="ocr_n.csv" gs_n="gs_n.csv"

cwltool |cwl-runner ochre/cwl/vudnc-preprocess.cwl --in_dir $in_dir --ocr_dir_name $ocr_dir_name --gs_dir_name $gs_dir_name --aligned_dir_name $aligned_dir_name --ocr_n $ocr_n --gs_n $gs_n --align_m $align_m --align_c $align_c

Howerver, it is permanently failed with the following message:

[step merge-json] Cannot make job: Value for file:///home/ochre/ochre/cwl/align-texts-wf.cwl#merge-json/in_files not specified

[workflow align-texts-wf] completed permanentFail

I'd be grateful if you could help to figure out the problem. Thanks H

hadiasheri avatar Aug 19 '18 14:08 hadiasheri

I think the workflow fails because of changes to nlppln. I'll try to see if I can fix that later.

Alo, I really recommend to use a different dataset than the vudnc corpus. It is just to noisy. Here is a poster that shows the most common error is a hyphenation error ('- ' that should be replaced with '', that is just too easy): https://doi.org/10.5281/zenodo.1189245

jvdzwaan avatar Aug 22 '18 19:08 jvdzwaan

Okay, it should work again. Be careful to read the updated documentation in the README. Also, don't forget to update nlppln.

For future reference, this is the relevant commit: 9ee6d7cca72bb9bcd074e1843b12ceea122662ce

jvdzwaan avatar Sep 11 '18 18:09 jvdzwaan