AudioDec Is it missing some activation functions between some layers?

Is it missing some activation functions between some layers?

Open BridgetteSong opened this issue 11 months ago • 6 comments

Thanks for your work. I have trained model in my own dataset. I met same question as ISSUE7. When I checked the model, I found some difference in AutoEncoder:

Before encoder_output is feeded into Projector, is an activation function needed?
Before ConvTranspose1d, is an activation function needed?
add tanh activation function in Decoder final out?

In other popular implementations, they all added those. So I add those:

add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/projector.py#L50
add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L62
add an activation function before https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120
add a tanh() after https://github.com/facebookresearch/AudioDec/blob/9b498385890b38de048f2db535c2fbf8cbeea80b/models/autoencoder/modules/decoder.py#L120

When I added those and trained again, I got some improvement in unseen datasets than your base when I only trained AutoEncoder with discriminators and don't finetune it with AudioDec.
BTW, I trained model only with Librispeech and AIShell with 16K sampling_rate and tested model by another clean TTS dataset with training 160K steps. When my model is finished(total 800k), I will compare final results, upload some demos and share my training config.

Jul 25 '23 04:07 BridgetteSong

AudioDec AudioDec copied to clipboard

Is it missing some activation functions between some layers?

AudioDec
AudioDec copied to clipboard