diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Objects From Dreambooth Training Are Not in Output

Open rajbala opened this issue 2 years ago • 3 comments

Describe the bug

I am trying to use the train_dreambooth.py script with the example images of a dog.

When I run inferencing on the model that is output from the training, the dog is never in the images being rendered at all. I have tried all sorts of combinations of learning_rate and max_train_steps, but it makes no difference.

Instead of the dog in front of the Eiffel Tower, I just get photographs of the Eiffel Tower.

Been banging my head against a wall for days trying to figure this out, so any pointers would be appreciated! :)

Reproduction

export MODEL_NAME="stabilityai/stable-diffusion-2-1"
export OUTPUT_DIR="/dev/saved_models/dog-sdv21"
export INSTANCE_DIR="/dev/instance_images/dog/"
export TRAINING_RESOLUTION="512"
export LEARNING_RATE="5e-6"
export MAX_TRAIN_STEPS="400"
export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128'
accelerate launch /dev/diffusers/examples/dreambooth/train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="asdlfjsdlfkjs2342342" \
--resolution=$TRAINING_RESOLUTION \
--train_batch_size=1 \
--gradient_accumulation_steps=2 --gradient_checkpointing \
--enable_xformers_memory_efficient_attention \
--learning_rate=$LEARNING_RATE \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=$MAX_TRAIN_STEPS

My prompt for inferencing is this:

instance_token = "asdlfjsdlfkjs2342342"
prompt = "A photograph of {} in front of the Eiffel Tower".format(instance_token)

Logs

I do not get any errors during training or inferencing.

System Info

  • diffusers version: 0.13.0.dev0
  • Platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Huggingface_hub version: 0.12.0
  • Transformers version: 4.26.1
  • Accelerate version: 0.16.0
  • xFormers version: 0.0.16
  • Using GPU in script?: 3090ti
  • Using distributed or parallel set-up in script?: No

rajbala avatar Feb 16 '23 03:02 rajbala

BTW, I tried upgrading xformers and tried training without xformers and it still does not work for me.

rajbala avatar Feb 16 '23 13:02 rajbala

@williamberman could you take a look here?

patrickvonplaten avatar Feb 16 '23 14:02 patrickvonplaten

Is there anything I can do to help troubleshoot what the source of the problem may be?

rajbala avatar Feb 17 '23 15:02 rajbala

@williamberman Hi. Anything I can do to help identify the source of the problem? I have tried all sorts of changes from different operating systems, various versions of CUDA and nothing seems to work.

rajbala avatar Feb 23 '23 23:02 rajbala

cc @williamberman gentle re-ping here

patrickvonplaten avatar Mar 06 '23 13:03 patrickvonplaten

@rajbala please update to tip of diffusers and update xformers to one of the 0.0.17 pre-release versions

https://huggingface.co/docs/diffusers/v0.14.0/en/optimization/xformers#installing-xformers https://github.com/huggingface/diffusers/issues/2234 https://github.com/facebookresearch/xformers/issues/631

williamberman avatar Mar 06 '23 20:03 williamberman

I was finally able to figure this out after a great deal of trial and error, but I don't quite understand it. I installed the CUDA driver, ver. 12.0 and 12.1, supplied by Nvidia rather than the ones available through the Ubuntu.

rajbala avatar Mar 07 '23 20:03 rajbala

Appreciate your manual debugging @rajbala. We'll stay on the lookout if we get similar reports and if so, dig more into the driver issue

williamberman avatar Mar 16 '23 10:03 williamberman