DiffusionCLIP Different results in ViT-B/16 and ViT-L/14@336px

Different results in ViT-B/16 and ViT-L/14@336px

Open TerryMelody opened this issue 11 months ago • 0 comments

Hello dear authors I have a little question about the model choice and parameter optimization. I set "Human->Zombie", here are the results: a. ViT-B/16: just one epoch can achieve the good result like follows b. ViT-L/14@336px: but when I use this one the results seem strange. I don't know whether it should be set different parameters to finetune the diffusion model. (left->right:1-5 epochs)

Mar 20 '24 03:03 TerryMelody

DiffusionCLIP DiffusionCLIP copied to clipboard

Different results in ViT-B/16 and ViT-L/14@336px

DiffusionCLIP
DiffusionCLIP copied to clipboard