sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Bad Flux lora training result

Open phyllispeng123 opened this issue 1 year ago • 7 comments

Hi@kohya-ss , thank you for your detailed and excellent work on FLUX finetuning and lora training!! I got bad results when I ran the sample lora_training script with network_dim=32, input 50 high-quality 1024 * 1024 images, max_train_epochs=50 ( 2500steps in total )

python==3.10.15 torch==2.4.0
torchmetrics ==1.6.0
torchvision==0.19.0
transformers==4.44.0
accelerate==0.33.0
xformers== 0.0.23.post1 diffusers==0.25.0

CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes 1 --main_process_port 23333 \ flux_train_network.py \ --pretrained_model_name_or_path /black-forest-labs/FLUX.1-schnell/flux1-schnell.safetensors \ --clip_l /SD3/text_encoders/clip_l.safetensors \ --t5xxl /SD3/text_encoders/t5xxl_fp16.safetensors \ --ae /black-forest-labs/FLUX.1-schnell/ae.safetensors \ --cache_latents_to_disk \ --save_model_as safetensors \ --sdpa \ --persistent_data_loader_workers \ --max_data_loader_n_workers 2 \ --seed 42 \ --gradient_checkpointing \ --mixed_precision bf16 \ --save_precision bf16 \ --network_module networks.lora_flux \ --network_dim 32 \ --network_train_unet_only \ --optimizer_type adamw8bit \ --learning_rate 1e-4 \ --cache_text_encoder_outputs \ --cache_text_encoder_outputs_to_disk \ --highvram \ --max_train_epochs 50 \ --save_every_n_epochs 1 \ --dataset_config flux_image_50.toml \ --output_dir /flux_unet/log/lora \ --output_name flux-lora-name \ --timestep_sampling shift \ --discrete_flow_shift 3.1582 \ --model_prediction_type raw \ --guidance_scale 1.0 \

I got bad inference result when running python3 flux_minimal_inference.py --ckp black-forest-labs/FLUX.1-schnell/flux1-schnell.safetensors --clip_l /SD3/text_encoders/clip_l.safetensors --t5xxl /SD3/text_encoders/t5xxl_fp16.safetensors --ae /black-forest-labs/FLUX.1-schnell/ae.safetensors --dtype bf16 --prompt "A small cactus with a happy face in the Sahara desert." --out /flux_unet/log/lora --seed 42 --flux_dtype fp8 --offload --lora /flux_unet/log/lora/flux-lora-name.safetensors;1.0

The comparison between the original FLUX output ( upper ) and the lora-added output (lower) is image image

whereas my training images are very good (like this) image

Can you give me some hints about it ? thank you so much !!!!!

phyllispeng123 avatar Dec 04 '24 10:12 phyllispeng123

Here are a few things to get you in the right direction...

  • It's been my experience that schnell is more prone to having issues in training than dev. I don't know if using dev is an option for you, but if it is, you may want to rerun using that.
  • 50 images are too many. Try 20-30 with a consistent concept. Sometimes less. I've done several successful LoRAs with one image. More pictures may take longer to converge.
  • With an LR of only 0.0001, more than 2500 steps may be needed, especially with 50 pictures. Typically, I would use an LR of 0.00015 with 20 pictures for 4000 steps. Sometimes, it would converge at 2000, and sometimes, it would need the full run.
  • Your sample image is AI-generated and has artifacts. These can be magnified through training. I don't know if more of your training set is AI-generated, but that could be the issue as well.

daileta1 avatar Dec 04 '24 17:12 daileta1

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift

sdbds avatar Dec 05 '24 03:12 sdbds

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582,if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?

phyllispeng123 avatar Dec 05 '24 03:12 phyllispeng123

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift检查您的流程是否包含一个移位节点。旧的通量工作流程最初没有移位,而训练脚本中包含了移位

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582,if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?训练脚本包括 --timestep_sampling shift--discrete_flow_shift 3.1582 ,我应该删除或更改--timestep_sampling 为其他采样类型吗?我发现 flux-schnell 调度器有偏移 1,我是否可以将 -discrete_flow_shift 改为 1?

If you train schnell, it should not use any shift. Only dev has shift; you should remove the related shift parameters during training.

sdbds avatar Dec 05 '24 11:12 sdbds

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift检查您的流程是否包含一个移位节点。旧的通量工作流程最初没有移位,而训练脚本中包含了移位

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582,if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?训练脚本包括 --timestep_sampling shift--discrete_flow_shift 3.1582 ,我应该删除或更改--timestep_sampling 为其他采样类型吗?我发现 flux-schnell 调度器有偏移 1,我是否可以将 -discrete_flow_shift 改为 1?

If you train schnell, it should not use any shift. Only dev has shift; you should remove the related shift parameters during training.

@sdbds I just change the training script to --timestep_sampling uniform \ --discrete_flow_shift 1.0 \ --guidance_scale 0.0 \

however, i still get bad results, can you explain where the shift is ? i think i did not get your idea 「(;´༎ຶД༎ຶ`)」

phyllispeng123 avatar Dec 06 '24 09:12 phyllispeng123

I am also confused about this -discrete_flow_shift. What is its purpose? How should it be set during training and inference? Can someone explain it? Thank you very much! By the way, it's a good attempt to set the guidance_scale=1.0.

wanglaofei avatar Dec 27 '24 10:12 wanglaofei

angthing update?

godwenbin avatar Mar 24 '25 03:03 godwenbin