sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

--cache_text_encoder_outputs_to_disk seems broken (in sdxl_train_control_net_lllite.py)

Open Nekotekina opened this issue 9 months ago • 2 comments

Hello, I was experienting with LLLite controlnets with extremely poor results. Now I tried to reproduce depth controlnet which seemed easy in theory. Even the original LLLite depth controlnet made from SDXL 1.0 worked very well on any Illustrios-based models I'm currently testing.

Example images: Image Image

Data dir and control dir contain 4001 different images generated by NoobXL EPS model.

Config: --network_dim 64 --cond_emb_dim 64 --learning_rate 2e-4 Script to launch:

python sdxl_train_control_net_lllite.py
   --pretrained_model_name_or_path /mnt/ccache/sd-models/noob_eps.safetensors \
   --train_data_dir "$tpath" \
   --conditioning_data_dir "$cpath" \
   --cache_latents \
   --cache_latents_to_disk \
   --cache_text_encoder_outputs \
   --cache_text_encoder_outputs_to_disk \
   --resolution 896,1152 \
   --output_dir /mnt/B/lllite/ \
   --output_name "$name" \
   --caption_extension .txt \
   --save_precision bf16 \
   --save_every_n_epochs 1 \
   --persistent_data_loader_workers \
   --max_data_loader_n_workers 6 \
   --mixed_precision bf16 --full_bf16 \
   --metadata_title "$name" \
   --use_8bit_adam \
   --xformers \
   --save_state \
   --save_state_on_train_end \
   --vae_batch_size 4 \
   --seed 1 $args "${@:4}"

Now at 27th epoch, it shows very weak signs of "control". Is it too small or my parameters are broken? I don't understand. I thought that the script might be an issue because LLLite architecture is essentially abandoned in favor of big fat controlnets.

Nekotekina avatar Mar 01 '25 10:03 Nekotekina

This is what I got using the abovementioned depth image as a control:

Image

Previous epochs are almost not different.

ghost avatar Mar 01 '25 10:03 ghost

Found something suspicious:

Image

Text encoder outputs were cached on disk. But their loading time was significantly faster than VAE latents, like, instant. Maybe they weren't loaded at all?

I'll try my luck with removing disk caches and --cache_text_encoder_outputs_to_disk option... Update: it's 9th epoch and it does seem to work after all.

ghost avatar Mar 03 '25 19:03 ghost