VQMIVC icon indicating copy to clipboard operation
VQMIVC copied to clipboard

What do z_dim and c_dim stand for?

Open Hu-chengyang opened this issue 3 years ago • 4 comments
trafficstars

Dear PHD: Could you tell me what do z_dim:64 and c_dim:256 in config/model/default stand for?And what n_embeddings: 512 in config/model/default stand for?Thank you very much.

Hu-chengyang avatar Sep 07 '22 12:09 Hu-chengyang

Hi, all these three variables are related with content encoder, z_dim denotes the dimension of acoustic units (z) in VQ codebook, c_dim denotes the dimension of continuous vectors after LSTM (g-net in the paper) that takes z as inputs, n_embeddings is the number of acoustic units in VQ codebook.

Wendison avatar Sep 11 '22 09:09 Wendison

Thank you!

Hu-chengyang avatar Sep 11 '22 14:09 Hu-chengyang

In model_encoder.py/class Encoder(nn.Module)/def forwad(self, mels): z = self.conv(mels.float()) # (bz, 80, 128) -> (bz, 512, 128/2)

what does 128 mean?What variable does it represent? Thank you very much.

Hu-chengyang avatar Sep 13 '22 12:09 Hu-chengyang

128 is the number of frames of mel-spectrograms used for training, it denotes 1.28s of waveform.

Wendison avatar Sep 20 '22 11:09 Wendison