sd-scripts
sd-scripts copied to clipboard
--cache_text_encoder_outputs_to_disk seems broken (in sdxl_train_control_net_lllite.py)
Hello, I was experienting with LLLite controlnets with extremely poor results. Now I tried to reproduce depth controlnet which seemed easy in theory. Even the original LLLite depth controlnet made from SDXL 1.0 worked very well on any Illustrios-based models I'm currently testing.
Example images:
Data dir and control dir contain 4001 different images generated by NoobXL EPS model.
Config: --network_dim 64 --cond_emb_dim 64 --learning_rate 2e-4 Script to launch:
python sdxl_train_control_net_lllite.py
--pretrained_model_name_or_path /mnt/ccache/sd-models/noob_eps.safetensors \
--train_data_dir "$tpath" \
--conditioning_data_dir "$cpath" \
--cache_latents \
--cache_latents_to_disk \
--cache_text_encoder_outputs \
--cache_text_encoder_outputs_to_disk \
--resolution 896,1152 \
--output_dir /mnt/B/lllite/ \
--output_name "$name" \
--caption_extension .txt \
--save_precision bf16 \
--save_every_n_epochs 1 \
--persistent_data_loader_workers \
--max_data_loader_n_workers 6 \
--mixed_precision bf16 --full_bf16 \
--metadata_title "$name" \
--use_8bit_adam \
--xformers \
--save_state \
--save_state_on_train_end \
--vae_batch_size 4 \
--seed 1 $args "${@:4}"
Now at 27th epoch, it shows very weak signs of "control". Is it too small or my parameters are broken? I don't understand. I thought that the script might be an issue because LLLite architecture is essentially abandoned in favor of big fat controlnets.
This is what I got using the abovementioned depth image as a control:
Previous epochs are almost not different.
Found something suspicious:
Text encoder outputs were cached on disk. But their loading time was significantly faster than VAE latents, like, instant. Maybe they weren't loaded at all?
I'll try my luck with removing disk caches and --cache_text_encoder_outputs_to_disk option... Update: it's 9th epoch and it does seem to work after all.