vall-e icon indicating copy to clipboard operation
vall-e copied to clipboard

AISHELL1 with cut_set.normalize_loudness

Open lifeiteng opened this issue 1 year ago • 0 comments

I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173

                if args.prefix == "aishell":
                    # NOTE: the loudness of aishell audio files is around -33
                    # The best way is datamodule --on-the-fly-feats --enable-audio-aug
                    cut_set = cut_set.normalize_loudness(
                        target=-20.0, affix_id=True
                    )

But model's accuracy drops a lot. I have not figure it out. image

Ref:

  • https://github.com/lhotse-speech/lhotse/pull/1016
  • https://github.com/lhotse-speech/lhotse/pull/1029
    • The bug is not triggered because the sample_rate of aishell is 16000.

lifeiteng avatar Apr 15 '23 01:04 lifeiteng