For the Flux model, after performing fine-tuning/DreamBooth using flux_train.py, the generated images are corrupted.
The training parameters are as follows:
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train.py --pretrained_model_name_or_path ./flux1-dev.safetensors --clip_l ./clip_l.safetensors --t5xxl ./t5xxl_fp16.safetensors --ae ./ae.safetensors --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 1 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --dataset_config ./clothes.toml --output_dir ./output_model --output_name daofu_clothes_v1 --learning_rate 5e-5 --max_train_epochs 10 --sdpa --highvram --cache_text_encoder_outputs_to_disk --cache_latents_to_disk --save_every_n_epochs 2 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0 --fused_backward_pass --blocks_to_swap 8 --full_bf16
The data toml file is configured as follows:
`[general] flip_aug = true color_aug = false keep_tokens_separator= "|||" shuffle_caption = false caption_tag_dropout_rate = 0 caption_extension = ".txt"
[[datasets]] batch_size = 10 enable_bucket = true resolution = [1024,1280] max_bucket_reso = 2048 [[datasets.subsets]] image_dir = "/workspace/tangfan/code_store/lrq/sd-scripts/yifu_mote" num_repeats = 1`
Here is a sample of the contents of the generated metadata.json file:
{ "/home/sda/tangfan/code_store/lrq/sd-scripts/yifu_mote/1000_clothes_.jpg": { "tags": "The image shows a person wearing a stylish outfit that combines a hoodie and a skirt. The hoodie is cream-colored, made of a soft fabric, and features a drawstring hood. It has long sleeves and a relaxed fit, giving it a casual and comfortable look. The skirt is also cream-colored and appears to be made of a crocheted or knitted material, with intricate patterns and fringes. The skirt has a layered design, with a lace-like pattern at the top and a series of fringes at the bottom, adding a touch of elegance and movement to the outfit. The person is also wearing white, chunky winter boots with fur lining, which complement the outfit and provide warmth. The overall look is cozy and stylish, suitable for a casual day out." }, "/home/sda/tangfan/code_store/lrq/sd-scripts/yifu_mote/1001_clothes_.jpg": { "tags": "The image shows a person wearing a stylish outfit. The top is a cream-colored, chunky knit sweater with a high turtleneck. It features a unique design element on the sleeve, with a decorative, lacy pattern that adds a touch of elegance. The sweater has a loose, relaxed fit, giving it a cozy and comfortable appearance. The bottom half of the outfit consists of a black leather skirt with a shiny finish. The skirt has a wrap-around design, creating a layered look that adds texture and interest to the outfit. The combination of the soft knit sweater and the sleek leather skirt creates a stylish contrast between textures and materials." }, }
The corresponding image:
The structure of the data files:
The inference results from the trained checkpoint.
What could be the possible reasons for this?
Hard to say as it looks a little unstable, maybe indicating the hyperparameters are not working as well. The 5e-5 might be too high for model fine tuning but you may need to experiment to isolate the issue. It is too complicated to give good training parameters in this format (via issues) but would look to see if you get improvements by adjusting the parameters.