when and how replace hunyuanVAE by DC-AE?
The technical report mentions that the model is initially trained using HunyuanVAE and then switched to DC-AE for training with a higher compression rate. How is the timing for switching the VAE determined? Besides, when switching the VAE, are only the parameters of the last and first layers unfrozen while keeping the other parameters fixed, waiting until the training stabilizes before unfreezing all parameters, or are all parameters unfrozen immediately after switching the VAE for training?
-
Switching from Hunyuan VAE to DCAE: In fact, the timing of this switch is partly constrained by our limited computational resources. At high resolutions, we found that our computational resources could not fully support training to full convergence, so after achieving a certain level of results, we decided to try a high-compression AE that might be more meaningful for the community.
-
Freezing the parameters of the first and last layers—whether this refers to training the DCAE or training the Diffusion model with DCAE is unclear: • If it’s the former: Although the DCAE report mentions this technique, we did not adopt it. Since the architectures of the Hunyuan VAE and DCAE are different, we trained the DCAE from scratch with all parameters unfrozen. • If it’s the latter: We also trained the Diffusion model from scratch with all parameters unfrozen, based on our previous experience with adaptation.
Thanks for you reply. My current understanding is that you first used hunyuanVAE to train DiT, and after training for some time, switched to DC-AE to continue training DiT. When using DC-AE to train DiT, the initial and final layers were reinitialized. Is this understanding correct ?
yes