icefall
icefall copied to clipboard
Pruned transducer stateless7 for AISHELL-1
Training command is:
./pruned_transducer_stateless7/train.py \
--world-size 8 \
--num-epochs 90 \
--use-fp16 1 \
--max-duration 200 \
--exp-dir pruned_transducer_stateless7/exp \
--feedforward-dims "1024,1024,2048,2048,1024" \
--master-port 12535
The tensorboard log is available at https://tensorboard.dev/experiment/PmhYTwXiRwaON3aPdAFwGQ/
It looks abnormal that the validation loss value is rising, and WER is rising too.
greedy_search/log-decode-epoch-23-avg-2-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-17-17-43-23:53:greedy_search 7.27 best for dev
greedy_search/log-decode-epoch-23-avg-2-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-17-17-43-23:28:greedy_search 7.91 best for test
greedy_search/log-decode-epoch-58-avg-20-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-12-00-56:33:greedy_search 9.75 best for dev
greedy_search/log-decode-epoch-57-avg-26-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-13-00-35:38:greedy_search 9.77 best for dev
greedy_search/log-decode-epoch-57-avg-19-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-12-45-13:43:greedy_search 9.8 best for dev
greedy_search/log-decode-epoch-58-avg-24-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-12-09-41:33:greedy_search 9.8 best for dev
greedy_search/log-decode-epoch-58-avg-27-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-12-16-10:48:greedy_search 9.8 best for dev
greedy_search/log-decode-epoch-57-avg-23-context-2-max-sym-per-frame-1-use-averaged-model-2022-11-18-12-54-02:38:greedy_search 9.81 best for dev
I think base-lr=0.05 is too big for max-duration=200 with 8 GPUs.
--feedforward-dims "1024,1024,2048,2048,1024" \
There are not enough data for aishell1; could you use a smaller model?
E.g., 1024,1024,1536,1536,1024
--feedforward-dims "1024,1024,2048,2048,1024" \
There are not enough data for aishell1; could you use a smaller model? E.g.,
1024,1024,1536,1536,1024
Thanks for the guide, I will try it later.
--feedforward-dims "1024,1024,2048,2048,1024" \
There are not enough data for aishell1; could you use a smaller model? E.g.,
1024,1024,1536,1536,1024
Using smaller model, the result is still overfit. Best greedy search result of dev is 5.69。
Training command is:
./pruned_transducer_stateless7/train.py \
--world-size 4 \
--num-epochs 90 \
--use-fp16 1 \
--max-duration 120 \
--exp-dir pruned_transducer_stateless7/exp_1536 \
--feedforward-dims "1024,1024,1536,1536,1024" \
--master-port 12535
The tensorboard log is available at https://tensorboard.dev/experiment/xRxLLk6sQum59HKmq82WMQ/
Some WER summary is here:
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-20-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.69
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-50-avg-20-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.77
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-30-avg-20-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.84
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-30-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.97
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-40-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.98
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-80-avg-20-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 5.98
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-90-avg-30-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 6.09
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-90-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 6.14
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-test-greedy_search-epoch-20-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 6.2
./pruned_transducer_stateless7/exp_1536/greedy_search/wer-summary-dev-greedy_search-epoch-50-avg-10-context-2-max-sym-per-frame-1-use-averaged-model.txt:greedy_search 6.21
Using smaller model, the result is still overfit. Best greedy search result of dev is 5.69。
So a smaller model leads to a lower WER. Maybe you can try an even smaller model.
Using smaller model, the result is still overfit. Best greedy search result of dev is 5.69。
So a smaller model leads to a lower WER. Maybe you can try an even smaller model.
I think so too. I will train some smaller models until there are free GPU servers.
Same commit here: https://github.com/k2-fsa/icefall/pull/962