fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

TRaining of text encoder and changing learning rate....

Open 1blackbar opened this issue 3 years ago • 11 comments

Shiv added training of text encoder and gotbetter learning rate for better results, Any plans to add the same changes ? https://github.com/ShivamShrirao/diffusers/commits/main/examples/dreambooth

1blackbar avatar Oct 19 '22 16:10 1blackbar

the feature was added yesterday in the main diffusers, repo, I will implement it soon

TheLastBen avatar Oct 19 '22 16:10 TheLastBen

did you experiment with learning_rate=1e-6 ?

I'm testing the argument --train_text_encoder right now to make sure there is no memory error

TheLastBen avatar Oct 19 '22 16:10 TheLastBen

https://wandb.ai/psuraj/dreambooth/reports/Dreambooth-training-analysis--VmlldzoyNzk0NDc3

Im currently experimenting with 1e and 2e overall i think resylts are bvetter, in this repo i got random eye colours sometimes but with new code of his im getting accurate colours like on photos, its very new so still testing but i can already tell it is better he is using --sample_batch_size=4 \

with this repo i had best results with 2400 steps and 25 images, 5e-6 This new 2e-6 one needs more experiments, i had some bad ones from shiv repo and 1200 steps and 12 imgs ,2e-6 where likeness was crap but 1e-6 and 2400steps 25imgs had ok results , overall im more interested with text encoder training than changing learning rate from the old one, his use of batch size might have something to do with it all as well. With steps below 1500 i failed to get decent likeness with his or this repo but maybe not gave it enough images, i try to give it about 80-120 steps per one image or divide stepcount by 100 and throwin another image or two, or take away if likeness is crap so it would focus longer on likeness.Overall im still figuring this out but 2400-25imgs was best so far a couple times in a row with this repo and 5e-6. But i also train on class only so my settings might not be good for eveyrthing

another one- 1800steps - 20imgs 1e-6 good likeness. 5 bodshots ,15 heads, so this one got good likeness( not great) and editability.

Il switch gmail account and test your repo with new code now.

1blackbar avatar Oct 19 '22 19:10 1blackbar

Ok tested Your repo with 2400 steps and 25 imgs, wayyyy to much overfitting with 2e-6 , i cant change a style, goes into painting but not really that artist, just typical painting style, also i can do only images framed as my training images , so headshots mosly, i think maybe 4e-6 would be fine, ill try that and maybe even 6e-6 just to see what will happen. ok finished 4e-6 and its still overfit but less, cant change style to boris vallejo tho so pretty bad overfitting, trying 6e-6 now. Ok 6e-6 with 25imgs and 2400 steps is even worse overfit... useless pretty much and cant change style

tested 2e-6 with 10 imgs and 1000 steps, result is nice stylisation, slight identity leak with some painters but overall fast and kinda ok.The results are a bit all over the place with some subjects cause 1200 steps and 12imgs had crap likeness despite being close to this one.oh well... the best tho i had with 5e-6 and 25imgs-2400steps a couple of times so far till yesterday, not tried that with new todays code. Im posting the results here also for myself so i can go back here and see what i already tested.... so with 2e-6 so far 1000-10 ratio is best one i had bu ill try 1200-25img now

Ok 2e-6 1500 steps-25imgs is a winner so far with likeness and editability, will test even more steps until i start to overfit and wont be able to change style again IT also looks like its better to have more images than not enough, 25 works fine so far even if it does not train on all of them equally and 10 images was just not enough + not a fan having sub 10 images cause you wont get much pose and head angle variations and i like to do low angle head shots looking up etc.

1blackbar avatar Oct 19 '22 22:10 1blackbar

Thanks for the feedback, great help.

TheLastBen avatar Oct 20 '22 04:10 TheLastBen

so now using 5e-6 and removing --train_text_encoder should give me exact same results as 2 days ago or code was significantly changed ?

1blackbar avatar Oct 20 '22 04:10 1blackbar

@1blackbar you'll get the same results

TheLastBen avatar Oct 20 '22 07:10 TheLastBen

i went overboard with 57 imgs and 5e-6, 1800 steps ,trainig went berzerk and overfit with muddy results. It wouldbe really helpful to have images generated every 100 steps or so , so we can track what training rates work the best , you think thats possible with memory limits on colab to have both training and ability to inference ?

Trained with 1.5 and 4000steps ,48imgs, class male , 2e rate , likeness is good but stylisation is wery weird, it doesnt do the painter style but goes into painting mode, so frazetta doesnt look like his paintings... not recommended trained 800 steps -8 images 1e-6 - likeness crap- not recommended 1.5model trained 1700steps - 20 images 2e-6 - crap stylisation and i think overfits - not recommended 1.5 model trained 1600steps-30imgs 2e-6 - pretty great results maybe better than 1500 and 25imgs. 1.5 model

1blackbar avatar Oct 20 '22 07:10 1blackbar

Trained on shiv repo today, he changed the code and vae is merged into the ckpt so you dont need separate vae.pt file , trained on 4 images and 800 steps , it gives pretty good results very fast. He reproduced dreambooth results with the doggie with his default settings --pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \ --instance_data_dir=$INSTANCE_DIR \ --class_data_dir=$CLASS_DIR \ --output_dir=$OUTPUT_DIR \ --with_prior_preservation --prior_loss_weight=1.0 \ --instance_prompt="photo of sks {CLASS_NAME}" \ --class_prompt="photo of a {CLASS_NAME}" \ --seed=1337 \ --resolution=512 \ --train_batch_size=1 \ --train_text_encoder \ --mixed_precision="fp16" \ --use_8bit_adam \ --gradient_accumulation_steps=1 \ --learning_rate=1e-6 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --num_class_images=50 \ --sample_batch_size=4 \ --max_train_steps=800

image

So learning rate its just better to keep it low to avoid overfitting, he has it at 1e-6 but like i wrote he has sample batch size at 4 thats why his results are quick with such slow rate but how to reproduce the same settings in this repo ?

1blackbar avatar Oct 23 '22 09:10 1blackbar

by default the sample batch size is 4, and as for the new vae, it is already merged in the download model cell

TheLastBen avatar Oct 23 '22 09:10 TheLastBen

1500 steps, 2e-6, 200 autogenerated class images, 28 instance images

download (1 download (3) download (4) download (5)

TheLastBen avatar Oct 23 '22 09:10 TheLastBen