AudioDec icon indicating copy to clipboard operation
AudioDec copied to clipboard

Is it missing some activation functions between some layers?

Open BridgetteSong opened this issue 11 months ago • 6 comments

Thanks for your work. I have trained model in my own dataset. I met same question as ISSUE7. When I checked the model, I found some difference in AutoEncoder:

  • Before encoder_output is feeded into Projector, is an activation function needed?
  • Before ConvTranspose1d, is an activation function needed?
  • add tanh activation function in Decoder final out?

In other popular implementations, they all added those. So I add those:

  • add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/projector.py#L50
  • add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L62
  • add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120
  • add a tanh() after https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120

When I added those and trained again, I got some improvement in unseen datasets than your base when I only trained AutoEncoder with discriminators and don't finetune it with AudioDec.
BTW, I trained model only with Librispeech and AIShell with 16K sampling_rate and tested model by another clean TTS dataset with training 160K steps. When my model is finished(total 800k), I will compare final results, upload some demos and share my training config.

BridgetteSong avatar Jul 25 '23 04:07 BridgetteSong