monoses
monoses copied to clipboard
Step6_zero_phrase_filtering problem
While training monoses I got an error in Step 7 which is------
Traceback (most recent call last): File "/home/xyz/monoses/training/tuning/tune.py", line 335, in
main() File "/home/xyz/monoses/training/tuning/tune.py", line 322, in main extract_zmert_params(tmp + '/dcfg.txt.ZMERT.final')) File "/home/xyz/monoses/training/tuning/tune.py", line 73, in extract_zmert_params with open(path, encoding='utf-8', errors='surrogateescape') as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpv1m8y_i1/dcfg.txt.ZMERT.final' clean-corpus.perl: processing /home/xyz/models/monoses/src-tgt/tmpzbtcque6/train.bt & .trg to /home/xyz/models/monoses/src-tgt/tmpzbtcque6/train-supervised/clean, cutoff 3-80, ratio 9
From the log file and intermediate results, I find out that
- It successfully generated phrase tables.
- However, in step 6 it filtered 0% which I suspect.
P(f|e) filter limit: 100 Filtering using P(e|f) only. n=100
..................................................[n:500000] ..................................................[n:1000000] ..................................................[n:1500000] ..................................................[n:2000000] ..................................................[n:2500000] ..................................................[n:3000000] ..................................................[n:3500000] ..................................................[n:4000000] ..................................................[n:4500000] ..................................................[n:5000000] ..................................................[n:5500000] ..................................................[n:6000000] ..................................................[n:6500000] ..................................................[n:7000000] ..................................................[n:7500000] ..................................................[n:8000000] ..................................................[n:8500000] ..................................................[n:9000000] ..................................................[n:9500000] ..................................................[n:10000000]
unfiltered phrases pairs: 10000000
P(f|e) filter [first]: 0 (0%) significance filter: 0 (0%) TOTAL FILTERED: 0 (0%) FILTERED phrase pairs: 10000000 (100%)
- Then, in Step 7 while running decoder, it printed -
Call to decoder returned 1; was expecting 0. Z-MERT exiting prematurely (MertCore returned 30)...
I confirm that I have experienced this issue many times. The temporary directory is deleted before extract_zmert_params is called. The failure is intermittent
Ok, I finally figured out my related issue.
I got Z-MERT exiting prematurely (MertCore returned 1)...
This was due to moses2 segfaulting under the hood -> it segfaulted because one of the lines in the dev file I was passing into it was too long. I truncated each line in the dev set to 200 chars, and the segfault resolved. If you're doing unsupervised tuning, I recommend truncating the dev file you pass to moses2
Note: This also happened when I accidentally passed in two files for --supervised-tuning that were of different lengths.
Note2: Failure to use the moses tokenizer or escape-special-chars.perl script can also cause moses2 segfaults within zmert (https://github.com/moses-smt/mosesdecoder/tree/master/scripts/tokenizer)