icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Performance degradation when training with jsonl manifests (i.e. DynamicBucketingSampler)

Open pkufool opened this issue 3 years ago • 4 comments
trafficstars

Create a issue here to discuss this problem.

I re-ran the pruned_transducer_stateless4 baseline with DynamicBucketingSampler, the results are as follows:

setup greedy_search fast_beam_search modified_beam_search comments
Trained with jsonl manifests 2.93 & 6.61 2.99 & 6.55 3.17 & 6.78 --epoch 25 --avg 15 tensorboard
Baseline, trained with json manifests (by zengwei) 2.69 & 6.64 2.66 & 6.6 2.62 & 6.57 --epoch 30 --avg 6 details, tensorboard

I conducted another two experiments: (1) Use json manifests and BucketingSampler (i.e. to reproduce the original baseline) (2) shuffle the jsonl manifests and use DynamicBucketingSampler. shuffle command

 gunzip -c jsonl.gz | shuf | gzip -c  > new_jsonl.gz

The results are :

setup greedy_search fast_beam_search modified_beam_search comments
json + BucketingSampler 2.64 & 6.35 2.66 & 6.26 2.59 & 6.17 --epoch 25 --avg 11
shuf jsonl + DynamicBucketingSampler 2.66 & 6.34 2.74 & 6.33 2.65 & 6.27 --epoch 25 --avg 7

pkufool avatar Aug 04 '22 05:08 pkufool

I ran another experiment using global shuffled jsonl, the shuffle command is:

cat clean-100.jsonl clean-360.jsonl other-500.jsonl | shuf | gzip -c > libri-960.jsonl.gz

Summarize all my experiments below:

setup greedy_search fast_beam_search modified_beam_search comments
Earlier baseline, json + BucketingSampler (by zengwei) 2.69 & 6.64 2.66 & 6.6 2.62 & 6.57 --epoch 30 --avg 6
json + BucketingSampler 2.64 & 6.35 2.66 & 6.26 2.59 & 6.17 --epoch 25 --avg 11
jsonl + DynamicBucketingSampler 2.93 & 6.61 2.99 & 6.55 3.17 & 6.78 --epoch 25 --avg 15
shuffled jsonl + DynamicBucketingSampler 2.66 & 6.34 2.74 & 6.33 2.65 & 6.27 --epoch 25 --avg 7
global shuffled jsonl + DynamicBucketingSampler 2.68 & 6.39 2.69 & 6.33 2.69 & 6.43 --epoch 25 --avg 9

pkufool avatar Aug 08 '22 15:08 pkufool

Thanks, this is very interesting. I'm surprised that you are seeing a degradation with global shuffled vs locally shuffled` version (test-other WER on modified_beam_search 6.43 vs 6.27), it's small but large enough to think it could be non-random. But I would have expected the globally shuffled version to work better.

I was considering developing the streaming shuffling further in Lhotse to improve the sampler randomness, but maybe it's not needed.

pzelasko avatar Aug 09 '22 01:08 pzelasko

Here are the results I have using the master code.

It is trained WITHOUT pre-shuffling.

decoding method test-clean test-other comment
greedy search 2.69 6.45 --epoch 30, --avg 8, --use-averaged-model true

It does not indicate that results using jsonl with dynamic bucketing sampler are worse than that of using json with bucketing sampler.

Post the results from the first comment below for easy reference

Screen Shot 2022-08-10 at 10 48 18 AM

Training command:

export CUDA_VISIBLE_DEVICES="2,3,4,5,6,7"

./pruned_transducer_stateless4/train.py \
  --world-size 6 \
  --num-epochs 30 \
  --start-epoch 1 \
  --exp-dir pruned_transducer_stateless4/exp \
  --full-libri 1 \
  --max-duration 300 \
  --save-every-n 8000 \
  --keep-last-k 20 \
  --average-period 100

Decoding command

for epoch in 30; do
  for avg in 6 7 8 9 10 11; do
    ./pruned_transducer_stateless4/decode.py \
        --epoch $epoch \
        --avg $avg \
        --use-averaged-model 1 \
        --exp-dir ./pruned_transducer_stateless4/exp \
        --max-duration 600 \
        --decoding-method greedy_search
  done
done

csukuangfj avatar Aug 10 '22 02:08 csukuangfj

Note: The training command is identical with the one listed in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#librispeech-bpe-training-results-pruned-stateless-transducer-4

csukuangfj avatar Aug 10 '22 02:08 csukuangfj