VQMIVC
VQMIVC copied to clipboard
What do z_dim and c_dim stand for?
Dear PHD: Could you tell me what do z_dim:64 and c_dim:256 in config/model/default stand for?And what n_embeddings: 512 in config/model/default stand for?Thank you very much.
Hi, all these three variables are related with content encoder, z_dim denotes the dimension of acoustic units (z) in VQ codebook, c_dim denotes the dimension of continuous vectors after LSTM (g-net in the paper) that takes z as inputs, n_embeddings is the number of acoustic units in VQ codebook.
Thank you!
In model_encoder.py/class Encoder(nn.Module)/def forwad(self, mels): z = self.conv(mels.float()) # (bz, 80, 128) -> (bz, 512, 128/2)
what does 128 mean?What variable does it represent? Thank you very much.
128 is the number of frames of mel-spectrograms used for training, it denotes 1.28s of waveform.