diffusers the `attention_head_dim` argument for `UNet2DConditionModel`

the attention_head_dim in UNet2DConditionModel seems to be passed down to CrossAttnDownBlock2D and CrossAttnUpBlock2D as the number of attention head, instead of the dimension of each attention head

from diffusers import UNet2DConditionModel

unet = UNet2DConditionModel(
    attention_head_dim = 16)

# this prints 16
unet.down_blocks[0].attentions[0].transformer_blocks[0].attn1.heads

this definition is not consistent with other up/down blocks

down_block_types = ("AttnDownBlock2D",)
up_block_types = ("AttnUpBlock2D",)
unet = UNet2DConditionModel(
    attention_head_dim = 16,
    down_block_types = down_block_types,
    up_block_types = up_block_types)
# this prints 20
unet.down_blocks[0].attentions[0].num_heads

Is this intended or not? If not, we can probably swap the position of the 2 arguments passed to Transformer2DModel from CrossAttnDownBlock2D - but I'm not sure if there is any config somewhere that needs to be updated accordingly

Jan 16 '23 21:01 yiyixuxu

Hey @yiyixuxu,

Could you maybe add a link to the line in the code? If I would have to guess it is bad naming - maybe you could try to open a PR to fix it and see if the tests are passing? This is probably the easiest way to quickly see what needs to be changed.

Jan 20 '23 04:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 16 '23 15:02 github-actions[bot]

Hi, this doesn't seem to have been fixed. In UNet2DConditionModel the parameter attention_head_dim (described in the docs as "The dimension of the attention heads.") actually changes the number of heads, not the dimension of each head.

See unet_2d_condition.py,

down_block = get_down_block(
    ...,
    attn_num_head_channels=attention_head_dim[i],
    ...
)

which gets passed through into CrossAttnDownBlock2D. When adding attention to the block, the first two arguments to Transformer2DModel are num_attention_heads and attention_head_dim, but the number of channels is passed in, not the number of heads. See unet_2d_blocks.py,

attentions.append(
    Transformer2DModel(
        attn_num_head_channels,
        out_channels // attn_num_head_channels,
        ...
    )
)

and Transformer2DModel in transformer_2d.py,

class Transformer2DModel(ModelMixin, ConfigMixin):
    def __init__(
        self, 
        num_attention_heads: int = 16, 
        attention_head_dim: int = 88,
        ...

This is not only confusing because changing attention_head_dim doesn't have the desired result, but currently means that the number of heads remains the same throughout the network (and also for cross attention). With the default config, when the number of channels is 1280 then each attention head is 1280/8=160 dimensional which is too large for Flash Attention to be used.

So, assuming I haven't made any mistakes, in my opinion it would be better to simply change unet_2d_blocks.py to

attentions.append(
    Transformer2DModel(
        out_channels // attn_num_head_channels,
        attn_num_head_channels,
        ...
    )
)

meaning that attention_head_dim is actually the number of head channels, and then the number of heads is determined accordingly.

Thanks for your time!

May 15 '23 14:05 samb-t

Gentle ping here @yiyixuxu and @sayakpaul as well

May 16 '23 17:05 patrickvonplaten

Thank you @samb-t for such a detailed report, really appreciate it. Indeed your observations are correct.

I would like to also ask @williamberman's opinion on this.

May 17 '23 10:05 sayakpaul

@yiyixuxu would you like to fix it?

Jun 12 '23 15:06 patrickvonplaten

@patrickvonplaten yep 😅

Jun 12 '23 20:06 yiyixuxu

@patrickvonplaten ohh now I remember why I didn't fix it earlier - this will break the models no?

https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131

Jun 12 '23 20:06 yiyixuxu

I think where the confusion must have come from is that in the stable diffusion repo, the number of heads is fixed to 8 all through the UNet, e.g. here. Which must be why the models in diffusers work despite the incorrect naming. This doesn't seem to be the case for other approaches e.g. the Stability AI one here where the number of head channels is fixed and the config changes the number of heads accordingly.

So yeah to fix it the code would need changing and all necessary configs too. It seems a bit silly to stick with a very confusing parameter for the sake of not breaking the current setup? (Although I don't know how painful it would be to change all the configs). It would need to be chosen whether to stick with changing the number of heads by:

Altering the number of channels then the config e.g. this becomes:

"attention_head_dim": [
    40,
    80,
    160,
    160
]

or

Following the stable diffusion code and altering by directly changing the number of heads, then the config becomes:

"num_attention_heads": 8

If all configs have _diffusers_version then it could potentially be made backwards compatible by "incorrectly" using the config (but in the expected way) if the version is old?

Jun 12 '23 21:06 samb-t

Solving it here: https://github.com/huggingface/diffusers/pull/3797/files

Thanks for the super clear problem description @samb-t. We sadly cannot change all configs (we have 50,000 configs on the Hub and there are many more that are not on the Hub). Just changing the naming is simply too backwards breaking.

I think #3797 is a good way to prevent breaking everything while at the same time fixing the naming problem (we just let num_attention_heads default to attention_head_dim. This means that the config input attention_head_dim will at first become useless when someone defines num_attention_heads in the config, but this is OK for now I think.

Wdyt?

Jun 15 '23 12:06 patrickvonplaten

Also see: https://github.com/huggingface/diffusers/pull/3797/files#r1230915121

Jun 15 '23 12:06 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jul 09 '23 15:07 github-actions[bot]

请问，出现如下提示：Error occurred when executing DownloadAndLoadKolorsModel:

At the moment it is not possible to define the number of attention heads via num_attention_heads because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131. Passing num_attention_heads will only be supported in diffusers v0.19.

File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-KwaiKolorsWrapper\nodes.py", line 80, in loadmodel unet = UNet2DConditionModel.from_pretrained(model_path, subfolder= 'unet', variant="fp16", revision=None, low_cpu_mem_usage=True).to(dtype).eval() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained model = cls.from_config(config, **unused_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\configuration_utils.py", line 260, in from_config model = cls(**init_dict) ^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\configuration_utils.py", line 658, in inner_init init(self, *args, **init_kwargs) File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py", line 231, in init raise ValueError(

怎么解决？

Jul 11 '24 01:07 zzqf888

you can use attention_head_dim instead https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/unet/config.json#L8

Jul 11 '24 01:07 yiyixuxu

diffusers diffusers copied to clipboard

the `attention_head_dim` argument for `UNet2DConditionModel`

diffusers
diffusers copied to clipboard