Text encoder training then merge to unet--best practises and settings

Open rafstahelin opened this issue 1 year ago • 1 comments

Anybody have a thread to this pre-training method? I have heard set lr for TE to 1e-6 Unet same. Sounds a bit low. What method to merge? Any leads would be highly appreciated

Jun 29 '24 10:06 rafstahelin

Good luck. Ben never answers a straight question. As for my own experiments, 1e-6 and 2e-6 only work for faces, and only with 10-20 images max, text encoder around 350-450. Can you clarify "merge"?

Jul 18 '24 01:07 qudabear