Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

Why are Facodec and Ns3_facodec different?

Open ndhuynh02 opened this issue 1 year ago • 0 comments

I am looking at that Model code of 2 folders facodec and ns3_facodec. I know that ns3_facodec is the training code for Facodec. However, I am witnessing some differences between 2 architecture:

  • First of all, there are no LSTMs in the official Facodec in both Encoder and Decoder
  • Secondly, the timbre encoder is kinda different. Even though both are using Transformer, I am seeing that they are not the same.
  • The generator loss is the combination of multiple losses by some weights. But as I look at the NaturalSpeech3 paper at the Appendix part, it is clearly that the weights are not like in the paper, rather than the DAC paper
  • The upsample and downsample rates are not the same. For the official Ns3_codec, it is [2, 4, 5, 5] while the other one is [2,4, 8, 8]. This also means the hop_lengths for melspectrogram are 200 and 300, respectively
  • In the training code, the audio data has sampling rate of 24k Hz while the original paper performs on 16k Hz audio

ndhuynh02 avatar Nov 10 '24 02:11 ndhuynh02