FastSpeech2
FastSpeech2 copied to clipboard
Discrepancy in the Number of Decoder Layers
In Section 3.1
, under Model Configuration
, the paper states that the decoder consists of 4 FFT Transformer blocks. However, the provided checkpoints (and the model.yaml
configs) have 6 FFT Transformer blocks in the decoder.
Why this discrepancy? Did you later observe improvements in perfomance using 6 Decoder blocks instead of 4?