sd-scripts Apply T5 Attention Mask

sample_2_101_d90544ca9212445637b5 sample_0_99_d5ec2fd88772e6693f4f sample_0_199_2de4f22216458c24966c

I applied t5 attention mask, and the results are quite bad

Aug 21 '24 14:08 DarkViewAI

I think the model trained with T5 attention mask may need attention mask for inference too. How about flux_minimum_inference.py ?

Aug 21 '24 14:08 kohya-ss

There seems to be some bugs in apply_t5_attn_mask. I will fix it as soon as possible.

Aug 22 '24 00:08 kohya-ss

Any progress on this? I just trained a LoRA with --apply_t5_attn_mask (admittedly just blindly trying it without knowing what it does) and was getting rather prominent vertical bands in many of the epochs (and unfortunately in the best ones, too). And I also just tested flux_minimum_inference and am still getting the vertical bands.

After reading this I reran the training without the parameter and am not getting any vertical bands ... but, I don't know if this is a coincidence, the results with attention mask were much better overall more natural poses, better anatomy, less bleeding of character features into people in the background and less uncanny valley into way higher epochs.

Btw. ... if I'm using --apply_t5_attn_mask, do I need to train T5, too? (because I was previously training unet only and am now doing a test run with training clip and T5.)

Apr 13 '25 16:04 charly4711

So, I've done some more testing and posting the results here, because maybe it'll help understand what's going on, what to test ... or maybe I'm just doing something wrong.

I was using the Dev2Pro model from this article and was getting somewhat decent results, without using --apply_t5_attn_mask. I then turned that on hoping for better results, and actually got them, but I got vertical bands as can be seen here:

at epoch 15, look at the candle, the cheek, or the ear in the upper right corner. (Yes, putting that through a two-pass ultimate upscaler gets somewhat usable results most of the time, but eh.)

After reading this issue here, I tried the exact same parameters again but WITHOUT --apply_t5_attn_mask and got this:

Not terrible, but much less realistic and MUCH fewer good epochs in this run, BUT: No vertical bands.

Then after trying flux_minimum_inference and still getting the vertical bands, I thought, maybe it's worth testing inference with the Dev2Pro (as opposed to the suggestions from the article above to train against Dev2Pro but generate with vanilla Flux-Dev.) And when I used Dev2Pro for inference with the same epoch15 that gave me the artifacts above, I was NOT getting vertical bands. (I was getting rather boring images, but no vertical bands.) So, next I tested training against vanilla Flux-Dev with --apply_t5_attn_mask and then using the same model for inference and sure enough, I'm NOT getting vertical bands:

This is from epoch6 of that training run and is not very accurate, yet, and now I have the problem that in this run with the same parameters the training against vanilla Flux goes totally crazy after epoch 6 (possibly because prodigy kicked the LR through the roof roundabout that time.) But I guess, I'll tinker with the settings some.

No idea if the OP was also using Dev2Pro, their artifacts look a little different. And while I found the Dev2Pro easier to train, if using a different model for training and inference is the root of my problem with T5 attention mask, then obivously I dunno if it can be fixed inside sd-scripts.

Apr 14 '25 15:04 charly4711

Apply T5 Attention Mask - Horrible Results