sd-scripts Bad Flux lora training result

Hi@kohya-ss , thank you for your detailed and excellent work on FLUX finetuning and lora training!! I got bad results when I ran the sample lora_training script with network_dim=32, input 50 high-quality 1024 * 1024 images, max_train_epochs=50 ( 2500steps in total )

python==3.10.15 torch==2.4.0
torchmetrics ==1.6.0
torchvision==0.19.0
transformers==4.44.0
accelerate==0.33.0
xformers== 0.0.23.post1 diffusers==0.25.0

CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes 1 --main_process_port 23333 \ flux_train_network.py \ --pretrained_model_name_or_path /black-forest-labs/FLUX.1-schnell/flux1-schnell.safetensors \ --clip_l /SD3/text_encoders/clip_l.safetensors \ --t5xxl /SD3/text_encoders/t5xxl_fp16.safetensors \ --ae /black-forest-labs/FLUX.1-schnell/ae.safetensors \ --cache_latents_to_disk \ --save_model_as safetensors \ --sdpa \ --persistent_data_loader_workers \ --max_data_loader_n_workers 2 \ --seed 42 \ --gradient_checkpointing \ --mixed_precision bf16 \ --save_precision bf16 \ --network_module networks.lora_flux \ --network_dim 32 \ --network_train_unet_only \ --optimizer_type adamw8bit \ --learning_rate 1e-4 \ --cache_text_encoder_outputs \ --cache_text_encoder_outputs_to_disk \ --highvram \ --max_train_epochs 50 \ --save_every_n_epochs 1 \ --dataset_config flux_image_50.toml \ --output_dir /flux_unet/log/lora \ --output_name flux-lora-name \ --timestep_sampling shift \ --discrete_flow_shift 3.1582 \ --model_prediction_type raw \ --guidance_scale 1.0 \

I got bad inference result when running python3 flux_minimal_inference.py --ckp black-forest-labs/FLUX.1-schnell/flux1-schnell.safetensors --clip_l /SD3/text_encoders/clip_l.safetensors --t5xxl /SD3/text_encoders/t5xxl_fp16.safetensors --ae /black-forest-labs/FLUX.1-schnell/ae.safetensors --dtype bf16 --prompt "A small cactus with a happy face in the Sahara desert." --out /flux_unet/log/lora --seed 42 --flux_dtype fp8 --offload --lora /flux_unet/log/lora/flux-lora-name.safetensors;1.0

The comparison between the original FLUX output ( upper ) and the lora-added output (lower) is

whereas my training images are very good (like this)

Can you give me some hints about it ? thank you so much !!!!!

Dec 04 '24 10:12 phyllispeng123

Here are a few things to get you in the right direction...

It's been my experience that schnell is more prone to having issues in training than dev. I don't know if using dev is an option for you, but if it is, you may want to rerun using that.
50 images are too many. Try 20-30 with a consistent concept. Sometimes less. I've done several successful LoRAs with one image. More pictures may take longer to converge.
With an LR of only 0.0001, more than 2500 steps may be needed, especially with 50 pictures. Typically, I would use an LR of 0.00015 with 20 pictures for 4000 steps. Sometimes, it would converge at 2000, and sometimes, it would need the full run.
Your sample image is AI-generated and has artifacts. These can be magnified through training. I don't know if more of your training set is AI-generated, but that could be the issue as well.

Dec 04 '24 17:12 daileta1

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift

Dec 05 '24 03:12 sdbds

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582，if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?

Dec 05 '24 03:12 phyllispeng123

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift检查您的流程是否包含一个移位节点。旧的通量工作流程最初没有移位，而训练脚本中包含了移位

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582，if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?训练脚本包括 --timestep_sampling shift 和 --discrete_flow_shift 3.1582 ，我应该删除或更改--timestep_sampling 为其他采样类型吗？我发现 flux-schnell 调度器有偏移 1，我是否可以将 -discrete_flow_shift 改为 1？

If you train schnell, it should not use any shift. Only dev has shift; you should remove the related shift parameters during training.

Dec 05 '24 11:12 sdbds

Check your workflow to see if it includes a shift node. Old flux workflows did not have shift at first, while the training script included shift检查您的流程是否包含一个移位节点。旧的通量工作流程最初没有移位，而训练脚本中包含了移位

@sdbds the training script includes --timestep_sampling shift and --discrete_flow_shift 3.1582，if i should delete or change the --timestep_sampling to other sampling type? I just find the flux-schnell scheduler has shift 1, i may change the -discrete_flow_shift to 1 ?训练脚本包括 --timestep_sampling shift 和 --discrete_flow_shift 3.1582 ，我应该删除或更改--timestep_sampling 为其他采样类型吗？我发现 flux-schnell 调度器有偏移 1，我是否可以将 -discrete_flow_shift 改为 1？

If you train schnell, it should not use any shift. Only dev has shift; you should remove the related shift parameters during training.

@sdbds I just change the training script to --timestep_sampling uniform \ --discrete_flow_shift 1.0 \ --guidance_scale 0.0 \

however, i still get bad results, can you explain where the shift is ? i think i did not get your idea 「(;´༎ຶД༎ຶ`)」

Dec 06 '24 09:12 phyllispeng123

I am also confused about this -discrete_flow_shift. What is its purpose? How should it be set during training and inference? Can someone explain it? Thank you very much! By the way, it's a good attempt to set the guidance_scale=1.0.

Dec 27 '24 10:12 wanglaofei

angthing update?

Mar 24 '25 03:03 godwenbin