sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Has anyone already done finetuning a flux model(not lora)? Why is the image quality of the model generated by my finetune so poor?

Open xiaohaier123 opened this issue 1 year ago • 10 comments

I pulled the code from the sd3 branch, used the flux_train.py script, and fine-tuned it based on my own 130,000 images, but the inference images became of poor quality, blurry, and the human body was deformed(3w step).

xiaohaier123 avatar Oct 08 '24 06:10 xiaohaier123

Could you share the current training script? I am trying to fine-tune with less data, and I will share the relevant results here.

kelisiya avatar Oct 08 '24 10:10 kelisiya

having the same issue fine tune with sd3 branch, wasn't sure if it settings or low steps number 1000 on 10 images but tried DreamBooth on SD1.5 with those settings before and results were good

voytez avatar Oct 09 '24 08:10 voytez

I've had good luck just finetuning through lora and merging.

AbstractEyes avatar Oct 12 '24 18:10 AbstractEyes

having the same issue fine tune with sd3 branch, wasn't sure if it settings or low steps number 1000 on 10 images but tried DreamBooth on SD1.5 with those settings before and results were good

Unfortunately SD1.5/2/SDXL training with sd3 branch is not tested yet. Please use main branch for them.

kohya-ss avatar Oct 12 '24 23:10 kohya-ss

flux is a distillation model not for fine-tune, but lora works well. Maybe distillation model can't be used for base model fine-tune?

ymzlygw avatar Nov 08 '24 01:11 ymzlygw

Any conclusion here?

I pulled the code from the sd3 branch, used the flux_train.py script, and fine-tuned it based on my own 130,000 images, but the inference images became of poor quality, blurry, and the human body was deformed(3w step).

Any conclusion here? I encounter the same problem.

Doris-UESTC avatar Feb 12 '25 07:02 Doris-UESTC

@Doris-UESTC @xiaohaier123 where were your params? I have the same problem, so I'm interested to know what your experience is

joeyism avatar Feb 12 '25 20:02 joeyism

Hello, @xiaohaier123 I fine tuned one flux base model based on flux1-dev.safetensors, by the config below:

accelerate launch  --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train.py \
    --pretrained_model_name_or_path flux1-dev.safetensors  --clip_l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors --ae ae.safetensors \
    --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 \
    --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 \
    --dataset_config dataset_1024_bs1.toml  --output_dir output_flux --output_name trained_flux \
    --learning_rate 5e-5 --max_train_epochs 10  --sdpa --highvram --cache_text_encoder_outputs_to_disk --cache_latents_to_disk --save_every_n_epochs 1 \
    --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" \
    --lr_scheduler constant_with_warmup --max_grad_norm 0.0 \
    --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0 \
    --fused_backward_pass  --blocks_to_swap 8 --full_bf16

but I cannot do the inference either below pipeline loading or comfyui:

pipeline = AutoPipelineForText2Image.from_pretrained('trained_flux.safetensors', torch_dtype=torch.float16).to('cuda')
image = pipeline(prompt).images[0]

the error with the pipeline code is 'It looks like the config file at 'trained_flux.safetensors' is not a valid JSON file.' the error with comfyui is 'clip input is invalid..' I wonder how you did the inference with fine tuned base flux model, thank you!!

WenY2020 avatar Jun 24 '25 00:06 WenY2020

I figured out the answers to my above questions both on comfyui and diffusers pipeline, and now facing same issues as here: very poor image qualities, human body are deformed... also wondering why it is like this

WenY2020 avatar Jun 25 '25 11:06 WenY2020

I figured out the answers to my above questions both on comfyui and diffusers pipeline, and now facing same issues as here: very poor image qualities, human body are deformed... also wondering why it is like this

Could you tell me how did you solve the inference by the trained safetensor file? Thanks a lot.

zyf2316 avatar Jul 16 '25 09:07 zyf2316