generative_models icon indicating copy to clipboard operation
generative_models copied to clipboard

Training

Open inferense opened this issue 4 years ago • 3 comments

Thanks for the implementation. Few questions to the training:

  1. Does the training on other RGB dataset like COCO require any other changes besides hyperparameters of the priors?
  2. When it comes to conditioning on class labels / captions, I'm not quite sure about the y=None in forward of priors. Does this need to be changed to refer to the one-hot encoded labels / captions? Thanks!

inferense avatar Aug 25 '20 10:08 inferense

Assuming you are referring to the VQVAE2 implementation since you mention priors. To answer your questions:

  1. Yes, you should be able to train on COCO by just creating another dataset. You only need to specify the input_dims, which is currently specified for each dataset in the fetch_vqvae_dataloader function. It is then saved to the model config and automatically scaled down to the dims of the latent maps when training each prior.
  2. y is your conditioning one-hot vector that your dataset dataloader outputs. It is set to None, if you want to sample without conditioning on a class. For all my experiments, I did condition feeding a one-hot - you see that in line 80 of vqvae.py here.

Hope this helps.

kamenbliznashki avatar Aug 27 '20 11:08 kamenbliznashki

Thanks! and correct, I'm referring to VQVAE2 (sorry for not specifying earlier).

I've trained VQVAE (my own script) and extracted codes. As I'm using COCO, instead of one-hot I decided to use word embedding. Going through the vqvae_prior.py, I'm curious about the n_cond_classes value. It seems like it's mainly being used for linear transformation in the GatedResidualLayer? Any suggestions on how it might work with embedding vector instead of one-hot?

inferense avatar Aug 31 '20 20:08 inferense

y_cond_classes serves to set the dimension for a linear projection layer from a one-hot encoding of the class to the internal dimension (n_channels) of the gated residual layers - i.e. it's the size of the one-hot 'embedding'. You can set y_cond_classes to the size of your embedding vector and the model should work.

It is also used in the dataset constructor to set the size of the one hot encoding, but since you are using your own dataset constructor you don't need to worry about this bit.

kamenbliznashki avatar Sep 03 '20 10:09 kamenbliznashki