icefall Performance degradation when training with jsonl manifests (i.e. DynamicBucketingSampler)

trafficstars

Create a issue here to discuss this problem.

I re-ran the pruned_transducer_stateless4 baseline with DynamicBucketingSampler, the results are as follows:

setup	greedy_search	fast_beam_search	modified_beam_search	comments
Trained with jsonl manifests	2.93 & 6.61	2.99 & 6.55	3.17 & 6.78	--epoch 25 --avg 15 tensorboard
Baseline, trained with json manifests (by zengwei)	2.69 & 6.64	2.66 & 6.6	2.62 & 6.57	--epoch 30 --avg 6 details, tensorboard

I conducted another two experiments: (1) Use json manifests and BucketingSampler (i.e. to reproduce the original baseline) (2) shuffle the jsonl manifests and use DynamicBucketingSampler. shuffle command

 gunzip -c jsonl.gz | shuf | gzip -c  > new_jsonl.gz

The results are :

setup	greedy_search	fast_beam_search	modified_beam_search	comments
json + BucketingSampler	2.64 & 6.35	2.66 & 6.26	2.59 & 6.17	--epoch 25 --avg 11
shuf jsonl + DynamicBucketingSampler	2.66 & 6.34	2.74 & 6.33	2.65 & 6.27	--epoch 25 --avg 7

Aug 04 '22 05:08 pkufool

I ran another experiment using global shuffled jsonl, the shuffle command is:

cat clean-100.jsonl clean-360.jsonl other-500.jsonl | shuf | gzip -c > libri-960.jsonl.gz

Summarize all my experiments below:

setup	greedy_search	fast_beam_search	modified_beam_search	comments
Earlier baseline, json + BucketingSampler (by zengwei)	2.69 & 6.64	2.66 & 6.6	2.62 & 6.57	--epoch 30 --avg 6
json + BucketingSampler	2.64 & 6.35	2.66 & 6.26	2.59 & 6.17	--epoch 25 --avg 11
jsonl + DynamicBucketingSampler	2.93 & 6.61	2.99 & 6.55	3.17 & 6.78	--epoch 25 --avg 15
shuffled jsonl + DynamicBucketingSampler	2.66 & 6.34	2.74 & 6.33	2.65 & 6.27	--epoch 25 --avg 7
global shuffled jsonl + DynamicBucketingSampler	2.68 & 6.39	2.69 & 6.33	2.69 & 6.43	--epoch 25 --avg 9

Aug 08 '22 15:08 pkufool

Thanks, this is very interesting. I'm surprised that you are seeing a degradation with global shuffled vs locally shuffled` version (test-other WER on modified_beam_search 6.43 vs 6.27), it's small but large enough to think it could be non-random. But I would have expected the globally shuffled version to work better.

I was considering developing the streaming shuffling further in Lhotse to improve the sampler randomness, but maybe it's not needed.

Aug 09 '22 01:08 pzelasko

Here are the results I have using the master code.

It is trained WITHOUT pre-shuffling.

decoding method	test-clean	test-other	comment
greedy search	2.69	6.45	--epoch 30, --avg 8, --use-averaged-model true

It does not indicate that results using jsonl with dynamic bucketing sampler are worse than that of using json with bucketing sampler.

Post the results from the first comment below for easy reference

Training command:

export CUDA_VISIBLE_DEVICES="2,3,4,5,6,7"

./pruned_transducer_stateless4/train.py \
  --world-size 6 \
  --num-epochs 30 \
  --start-epoch 1 \
  --exp-dir pruned_transducer_stateless4/exp \
  --full-libri 1 \
  --max-duration 300 \
  --save-every-n 8000 \
  --keep-last-k 20 \
  --average-period 100

Decoding command

for epoch in 30; do
  for avg in 6 7 8 9 10 11; do
    ./pruned_transducer_stateless4/decode.py \
        --epoch $epoch \
        --avg $avg \
        --use-averaged-model 1 \
        --exp-dir ./pruned_transducer_stateless4/exp \
        --max-duration 600 \
        --decoding-method greedy_search
  done
done

Aug 10 '22 02:08 csukuangfj

Note: The training command is identical with the one listed in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#librispeech-bpe-training-results-pruned-stateless-transducer-4

Aug 10 '22 02:08 csukuangfj

icefall icefall copied to clipboard

Performance degradation when training with jsonl manifests (i.e. DynamicBucketingSampler)

icefall
icefall copied to clipboard