fast-stable-diffusion
fast-stable-diffusion copied to clipboard
Text encoder training then merge to unet--best practises and settings
Anybody have a thread to this pre-training method? I have heard set lr for TE to 1e-6 Unet same. Sounds a bit low. What method to merge? Any leads would be highly appreciated
Good luck. Ben never answers a straight question. As for my own experiments, 1e-6 and 2e-6 only work for faces, and only with 10-20 images max, text encoder around 350-450. Can you clarify "merge"?