vall-e
vall-e copied to clipboard
AISHELL1 with cut_set.normalize_loudness
I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173
if args.prefix == "aishell":
# NOTE: the loudness of aishell audio files is around -33
# The best way is datamodule --on-the-fly-feats --enable-audio-aug
cut_set = cut_set.normalize_loudness(
target=-20.0, affix_id=True
)
But model's accuracy drops a lot. I have not figure it out.
Ref:
- https://github.com/lhotse-speech/lhotse/pull/1016
- https://github.com/lhotse-speech/lhotse/pull/1029
- The bug is not triggered because the sample_rate of aishell is 16000.