FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

Discrepancy in the Number of Decoder Layers

Open shreeshailgan opened this issue 11 months ago • 0 comments

In Section 3.1, under Model Configuration, the paper states that the decoder consists of 4 FFT Transformer blocks. However, the provided checkpoints (and the model.yaml configs) have 6 FFT Transformer blocks in the decoder. Why this discrepancy? Did you later observe improvements in perfomance using 6 Decoder blocks instead of 4?

shreeshailgan avatar Mar 14 '24 10:03 shreeshailgan