diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Generated LoRA doesn't generate expected faces during inference

Open jndietz opened this issue 2 years ago • 5 comments

I generated a LoRA with the following command. My dataset_name points to a folder with 17 images of my face, and a metadata.jsonl with the appropriate format required by 🤗 datasets. The pretrained model is a merge of two models found on civitai, then converted to the required diffusers format.

accelerate launch --mixed_precision="fp16" /diffusers/examples/text_to_image/train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$1 \
  --dataset_name=$2 --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --max_train_steps=15000 \
  --checkpointing_steps=1000 \
  --validation_epochs=100 \
  --num_validation_images=0 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir=$3 \
  --max_train_samples=10
{"file_name": "001.png", "text": "This is a portrait jndietz"}
{"file_name": "002.png", "text": "This is a portrait jndietz"}
...
{"file_name": "016.png", "text": "This is a portrait jndietz"}
{"file_name": "017.png", "text": "This is a portrait jndietz"}

Then, I ran an inference python script to generate an image:

import os
from diffusers import DiffusionPipeline
import torch
    
model_path = os.getcwd() + "/jd-lora"
pipe = DiffusionPipeline.from_pretrained('./d50-r50', torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")

prompt = "a portrait of jndietz"
pipe.enable_attention_slicing()
pipe.enable_xformers_memory_efficient_attention()

def dummy(images, **kwargs):
    return images, False

pipe.safety_checker = dummy
image = pipe(prompt, width=512, height=512, num_inference_steps=40, guidance_scale=7.5).images[0]
image.save("jndietz.png")

But the output looks nothing like me. It's definitely a person, but nothing like the images I trained.

I've been able to use the textual_inversion repository locally with success, without any diffusers involved through the automatic1111 webui. Those embeddings are pretty close to looking like me exactly.

What could I be missing when using diffusers?

jndietz avatar Feb 16 '23 06:02 jndietz

Hey @jndietz,

Note that textual inversion is very different from LoRA dreambooth. We should maybe look into adding textual inversion support to the LoRA training scripts though at some point.h

patrickvonplaten avatar Feb 16 '23 14:02 patrickvonplaten

maybe using train_dreambooth_lora instead?

sleep2death avatar Feb 20 '23 14:02 sleep2death

@sleep2death I've tried using that one too. What is the difference between the train_text_to_image_lora and train_dreambooth_lora? I always see Lora grouped in with Dreambooth, so not sure what the big difference is.

jndietz avatar Feb 20 '23 16:02 jndietz

@jndietz By glimpsing the code, I think the main difference is that for dreambooth you can specify --with_prior_preservation, prior loss will be used as regularization, please correct me if my understanding is wrong.

timtaotao avatar Feb 21 '23 05:02 timtaotao

@jndietz By glimpsing the code, I think the main difference is that for dreambooth you can specify --with_prior_preservation, prior loss will be used as regularization, please correct me if my understanding is wrong.

Besides, in dreambooth_lora

vae.requires_grad_(False)
text_encoder.requires_grad_(False)
unet.requires_grad_(False)

timtaotao avatar Feb 21 '23 06:02 timtaotao

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 18 '23 15:03 github-actions[bot]