Collaborative-Diffusion About the training epoch of VAE model and uni-model for text to face

Hello! Based on the instructions you provided, I am trying to retrain the VAE model and uni-model for text to face on RTX3090, may I ask what is the epoch for training these two models respectively? Or are you judging whether to end the model training process based on the visualization results of reconstructions_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png and samples_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png? Looking forward to your answer.

Nov 03 '23 02:11 ourpubliccodes

Hi, for VAE, usually training 50-150 epochs give satisfactory checkpoints. You can observe the reconstruction results and the reconstruction loss. For Uni-Modal diffusion models, usually takes 100-200 epochs.

Dec 28 '23 07:12 ziqihuangg

Hello, may I ask if a mask file is also required for training a text to face single diffusion model. I trained the text to face single diffusion model on a new dataset without providing a mask file, and found that the training output only improved the image within the default square area

Jul 16 '24 01:07 jupytera

Hi, if you are referring to the text-to-image model, then no mask is needed.

Jul 16 '24 06:07 ziqihuangg