diffusers Attention masks are missing in SD3 to mask out text padding tokens

Attention masks are missing in SD3 to mask out text padding tokens

Open reminisce opened this issue 8 months ago • 2 comments

Describe the bug

In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in https://github.com/huggingface/diffusers/discussions/8628, and is created to track the progress of fixing this problem.

Thanks @sayakpaul for the discussion.

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

Jun 24 '24 05:06 reminisce

diffusers diffusers copied to clipboard

Attention masks are missing in SD3 to mask out text padding tokens

Describe the bug

Reproduction

Logs

System Info

Who can help?

diffusers
diffusers copied to clipboard