fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

Multiple subjects with no image caption

Open Curlypla opened this issue 2 years ago • 10 comments

Can you add the possibility to train several subjects using a json like in https://github.com/ShivamShrirao/diffusers/commit/351f3b6206f0453706346fd34a337cdf6ac6ef07.

The captioned images look unnecessarily complicated unlike this commit where you just have to specify which folder is the right one for your topic and your class

Curlypla avatar Oct 24 '22 12:10 Curlypla

It's not really complicated, just select all the images and rename one to for example : man_johnsmth.jpg, and the rest would be johnsmth(1).jpg ... man_johnsmth (2).jpg ....etc, and the instance prompt would be automatically set to "photo of man johnsmth" with johnsmth being the instance name.

Plus you can caption images to specify objects included in each image if you want advanced training.

TheLastBen avatar Oct 24 '22 13:10 TheLastBen

Yes but what should I put in "INSTANCE_NAME", "SUBJECT_TYPE" or "CLASS_DIR" since there are several of them?

Curlypla avatar Oct 24 '22 19:10 Curlypla

Because I did as you said by renaming the files with their classes and their instances and I put one of the two names in "INSTANCE_NAME" but at the end I just had the mixture of the two characters

Curlypla avatar Oct 24 '22 21:10 Curlypla

if you check the box "captioned images", It's doesn't matter what you put in instance name "INSTANCE_NAME", it won't be used, it will only use the images names as prompts, "INSTANCE_NAME" would then only serve as the CKPT file name.

if you have an image named : jhnsmth_person_in_a_forest (5).jpg, the instance prompt would automatically be ; "photo of jhnsmth person in a forest, it won't include 'instance name', the images names will become the only source for class and instance name.

as for the class dir, you can mixed all the class images without a problem.

I trained a model on Willem Dafoe and Emilia Clarke at the same time : (person_woman_emlclrk (1).jpg ...etc) and (person_man_wlmdfo (1).jpg) ... etc) :

download (1) download (3) download (7) 197557015-e5404f61-f4e9-403d-91d3-8e1dfd4c1f34

TheLastBen avatar Oct 25 '22 03:10 TheLastBen

i tried to train 2 people at once, both 25 imgs 600 steps, it didn get as good likeness as training just singe one for like 1500 steps, is this method capable of the same results? should i run it for 1500 steps like usual ? Also are we able to train again on the trained ckpt if we convert it back to diffusers ? so we can keep adding more subjects or "fix" existing subjects likeness?

1blackbar avatar Oct 25 '22 20:10 1blackbar

for the best results, use 3000 steps for 2 instance, 1500 for one yes it is possible to retrain

TheLastBen avatar Oct 25 '22 21:10 TheLastBen

Ill test it out, tried 4 subjects on shiv, works well with 6k steps but it overfits some subjects while others can be stylised more. When this repo will support --sample_batch_size=4 \ , i think this helps to not overfit and learn faster

1blackbar avatar Oct 26 '22 12:10 1blackbar

the default value of --sample_batch_size is already 4, if if I didn't add it to the colab, it is set to 4 ;

from the training script :

 )
    parser.add_argument(
        "--sample_batch_size", type=int, default=4, help="Batch size (per device) for sampling images."
    )

TheLastBen avatar Oct 26 '22 12:10 TheLastBen

Is it also possible to generate photos having group of multiple trained people/subjects or even when we can train for multiple subjects but a image can only have single subject at a time?

kartikeyporwal avatar Oct 29 '22 22:10 kartikeyporwal

It has no relation to the training, it's the limitation of the technology (for now), even dalle-2 has a hard time rendering more than two specific distinct faces on the same image. But if you play with the prompts and the weighs, you might get lucky, in this example, I managed to get two trained subjects on the same pic :

download (3) download (14) download (15) download (16)

Still frame of emlclrk, ((((with)))) wlmdfo laughing in The background, closeup, cinematic, 1970s film, 40mm f/2.8, real,remastered, 4k uhd, talking

negative prompt :

cartoon, fake, painting, 3d, low poly

Steps: 80, Sampler: Euler, CFG scale: 8.5, Seed: 2597262209, Size: 704x512, Model hash: fa3de41b, Denoising strength: 0.65, First pass size: 0x0

TheLastBen avatar Oct 30 '22 04:10 TheLastBen