AudioDec
AudioDec copied to clipboard
Is it missing some activation functions between some layers?
Thanks for your work. I have trained model in my own dataset. I met same question as ISSUE7. When I checked the model, I found some difference in AutoEncoder:
- Before encoder_output is feeded into Projector, is an activation function needed?
- Before ConvTranspose1d, is an activation function needed?
- add tanh activation function in Decoder final out?
In other popular implementations, they all added those. So I add those:
- add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/projector.py#L50
- add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L62
- add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120
- add a tanh() after https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120
When I added those and trained again, I got some improvement in unseen datasets than your base when I only trained AutoEncoder with discriminators and don't finetune it with AudioDec.
BTW, I trained model only with Librispeech and AIShell with 16K sampling_rate and tested model by another clean TTS dataset with training 160K steps. When my model is finished(total 800k), I will compare final results, upload some demos and share my training config.