fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

The normalization settings of input audio

Open Ther-nullptr opened this issue 1 year ago • 1 comments

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

In wav2vec2.0 and hubert, the config task.normalize is set to False (which means not to normalize the input audio), but data2vec is set to True, and the original paper also mentioned it. Will it have a big effect on experiment result?

Code

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or main):
  • PyTorch Version (e.g., 1.0)
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Ther-nullptr avatar Sep 01 '22 13:09 Ther-nullptr

@alexeib

Ther-nullptr avatar Sep 01 '22 13:09 Ther-nullptr

it wont have a much of an effect, but you have to match the feature extractor to the normalization setting

normalize in dataloader -> layer norm in feature extractor no normalization in dataloader -> group norm in first block of feature extractor + feature_grad_mult = 0.1 (rescale feature extractor grads by 0.1)

alexeib avatar Sep 18 '22 06:09 alexeib