Voice-Privacy-Challenge-2022 icon indicating copy to clipboard operation
Voice-Privacy-Challenge-2022 copied to clipboard

Error: data/train-clean-360_anon_sp/feats.scp already exists

Open suhitaghosh10 opened this issue 2 years ago • 5 comments

steps/diagnostic/analyze_alignments.sh --cmd run.pl data/lang exp/tri3b_cleaned steps/diagnostic/analyze_alignments.sh: see stats in exp/tri3b_cleaned/log/analyze_alignments.log 1 warnings in exp/tri3b_cleaned/log/build_tree.log 27 warnings in exp/tri3b_cleaned/log/acc...log 8 warnings in exp/tri3b_cleaned/log/update..log 33 warnings in exp/tri3b_cleaned/log/align...log 20 warnings in exp/tri3b_cleaned/log/convert..log 9 warnings in exp/tri3b_cleaned/log/fmllr...log steps/train_sat.sh: Likelihood evolution: -55.2576 -52.3376 -52.1623 -51.612 -50.1089 -48.6247 -47.4872 -46.7146 -46.154 -45.526 -45.1648 -44.7545 -44.4738 -44.2664 -44.0799 -43.9154 -43.7681 -43.6342 -43.512 -43.3358 -43.2092 -43.1134 -43.0245 -42.9414 -4 2.8652 -42.793 -42.7238 -42.6577 -42.5951 -42.5037 -42.44 -42.4085 -42.3882 -42.3738 exp/tri3b_cleaned: nj=10 align prob=-45.15 over 355.80h [retry=0.0%, fail=0.0%] states=5952 gauss=150145 fmllr-impr=0.71 over 293.24h tree-impr=9.36 steps/train_sat.sh: done training SAT system in exp/tri3b_cleaned local/chain/run_tdnn_1d__360.sh local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment) utils/data/perturb_data_dir_speed_3way.sh: data/train-clean-360_anon_sp/feats.scp already exists: refusing to run this (please delete data/train-clean-360_anon_sp/feats.scp if you want this to run)

suhitaghosh10 avatar May 12 '22 09:05 suhitaghosh10

I have a similar problem, each time I rerun the evaluation, the run terminates with an error because some files already exist. Could you please provide an update to the cleanup.sh that includes the new files of this challenge?

SarinaMeyer avatar May 12 '22 09:05 SarinaMeyer

Kaldi-based ASR AM models and corresponding Kaldi scripts are used for ASR evaluation. The ASR AM training scripts comprised multiple stages of training, and in some of them an additional verification is implemented to avoid repeating of already completed processes.

For example, in your case:

  1. https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5a8c9f90af2fda729b573ceb1c1f690ed0ea1c1e/baseline/local/nnet3/run_ivector_common.sh#L47
  2. https://github.com/kaldi-asr/kaldi/blob/d673298886e8d62d4c890e5e3eac8491df0b7e12/egs/wsj/s5/utils/data/perturb_data_dir_speed_3way.sh#L52

So, you can more precisely specify the stage from which you want to resume your training or remove a corresponding file as suggested in the Kaldi script.

cleanup.sh was originally designed for (re-)running ASR/ASV evaluation stages only (with already trained ASR/ASV evaluation models), and could be updated correspondingly for the new setup. However, it is not related to training of ASR/ASV models because each of these processes has multiple (sub)stages and the logic which data to remove will depend on the completed (sub)stages and is not so straightforward (requires user's supervision).

Natalia-T avatar May 12 '22 20:05 Natalia-T

Thanks for the detailed answer. But, when I am running for the first time, shouldn't it run without such errors?

suhitaghosh10 avatar May 13 '22 07:05 suhitaghosh10

But, when I am running for the first time, shouldn't it run without such errors?

Yes, for the first time you should not get such errors.

Natalia-T avatar May 14 '22 00:05 Natalia-T

I have also been experiencing this issue, and after numerous attempts, managed to get a full execution without any 'refusing to run' errors. I created a shell script in the baseline folder and pasted the following into it:

rm -rf data/train-clean-360_anon_sp/feats.scp
rm -rf data/train-clean-360_anon_sp_hires/feats.scp
rm -rf data/train-clean-360_anon_sp_hires_60k/feats.scp
rm -rf exp/tri3b_cleaned_ali_train-clean-360_anon_sp
rm -rf exp/models/user_asr_eval_anon/chain_cleaned/tree_sp/final.mdl

and I am running this each time I'd like to re-run the baseline.

egaznep avatar Jun 18 '22 20:06 egaznep