diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Dreambooth finetune FLUX dev CLIPTextModel

Open Wuyiche opened this issue 8 months ago • 5 comments

Describe the bug

ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I used four A100 to full amount of fine-tuning Flux. 1 dev model, according to https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

I used the toy dog dataset (5 images) for fine-tuning. I ran into a problem with max_position_embeddings for CLIPTextModel:

Reproduction

[rank1]: Traceback (most recent call last): [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1812, in [rank1]: main(args) [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1351, in main [rank1]: instance_prompt_hidden_states, instance_pooled_prompt_embeds, instance_text_ids = compute_text_embeddings( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1339, in compute_text_embeddings [rank1]: prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 963, in encode_prompt [rank1]: pooled_prompt_embeds = _encode_prompt_with_clip( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 937, in _encode_prompt_with_clip [rank1]: prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=False) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1056, in forward [rank1]: return self.text_model( [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 947, in forward [rank1]: hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 283, in forward [rank1]: raise ValueError( [rank1]: ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I changed max_position_embeddings in CLIPTextModel but it doesn't work: text_encoder_one = class_one.from_pretrained( args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, variant=args.variant, max_position_embeddings=77,ignore_mismatched_sizes=True )

My training script is as follows:

export MODEL_NAME="black-forest-labs/FLUX.1-dev" export INSTANCE_DIR="dog" export OUTPUT_DIR="trained-flux"

accelerate launch train_dreambooth_flux.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--mixed_precision="bf16"
--instance_prompt="a photo of sks dog"
--resolution=1024
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="prodigy"
--learning_rate=1.
--report_to="wandb"
--lr_scheduler="constant"
--lr_warmup_steps=0
--max_train_steps=500
--validation_prompt="A photo of sks dog in a bucket"
--validation_epochs=25
--seed="0"
--push_to_hub

Logs


System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.31
  • Running on Google Colab?: No
  • Python version: 3.10.16
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.29.1
  • Transformers version: 4.49.0
  • Accelerate version: 1.4.0
  • PEFT version: 0.14.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Wuyiche avatar Feb 28 '25 05:02 Wuyiche