diffusers
diffusers copied to clipboard
the `attention_head_dim` argument for `UNet2DConditionModel`
the attention_head_dim
in UNet2DConditionModel
seems to be passed down to CrossAttnDownBlock2D
and CrossAttnUpBlock2D
as the number of attention head, instead of the dimension of each attention head
from diffusers import UNet2DConditionModel
unet = UNet2DConditionModel(
attention_head_dim = 16)
# this prints 16
unet.down_blocks[0].attentions[0].transformer_blocks[0].attn1.heads
this definition is not consistent with other up/down blocks
down_block_types = ("AttnDownBlock2D",)
up_block_types = ("AttnUpBlock2D",)
unet = UNet2DConditionModel(
attention_head_dim = 16,
down_block_types = down_block_types,
up_block_types = up_block_types)
# this prints 20
unet.down_blocks[0].attentions[0].num_heads
Is this intended or not? If not, we can probably swap the position of the 2 arguments passed to Transformer2DModel
from CrossAttnDownBlock2D
- but I'm not sure if there is any config somewhere that needs to be updated accordingly
Hey @yiyixuxu,
Could you maybe add a link to the line in the code? If I would have to guess it is bad naming - maybe you could try to open a PR to fix it and see if the tests are passing? This is probably the easiest way to quickly see what needs to be changed.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, this doesn't seem to have been fixed. In UNet2DConditionModel
the parameter attention_head_dim
(described in the docs as "The dimension of the attention heads.") actually changes the number of heads, not the dimension of each head.
See unet_2d_condition.py,
down_block = get_down_block(
...,
attn_num_head_channels=attention_head_dim[i],
...
)
which gets passed through into CrossAttnDownBlock2D
. When adding attention to the block, the first two arguments to Transformer2DModel
are num_attention_heads
and attention_head_dim
, but the number of channels is passed in, not the number of heads. See unet_2d_blocks.py,
attentions.append(
Transformer2DModel(
attn_num_head_channels,
out_channels // attn_num_head_channels,
...
)
)
and Transformer2DModel
in transformer_2d.py,
class Transformer2DModel(ModelMixin, ConfigMixin):
def __init__(
self,
num_attention_heads: int = 16,
attention_head_dim: int = 88,
...
This is not only confusing because changing attention_head_dim
doesn't have the desired result, but currently means that the number of heads remains the same throughout the network (and also for cross attention). With the default config, when the number of channels is 1280
then each attention head is 1280/8=160
dimensional which is too large for Flash Attention to be used.
So, assuming I haven't made any mistakes, in my opinion it would be better to simply change unet_2d_blocks.py to
attentions.append(
Transformer2DModel(
out_channels // attn_num_head_channels,
attn_num_head_channels,
...
)
)
meaning that attention_head_dim
is actually the number of head channels, and then the number of heads is determined accordingly.
Thanks for your time!
Gentle ping here @yiyixuxu and @sayakpaul as well
Thank you @samb-t for such a detailed report, really appreciate it. Indeed your observations are correct.
I would like to also ask @williamberman's opinion on this.
@yiyixuxu would you like to fix it?
@patrickvonplaten yep 😅
@patrickvonplaten ohh now I remember why I didn't fix it earlier - this will break the models no?
https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131
I think where the confusion must have come from is that in the stable diffusion repo, the number of heads is fixed to 8 all through the UNet, e.g. here. Which must be why the models in diffusers
work despite the incorrect naming. This doesn't seem to be the case for other approaches e.g. the Stability AI one here where the number of head channels is fixed and the config changes the number of heads accordingly.
So yeah to fix it the code would need changing and all necessary configs too. It seems a bit silly to stick with a very confusing parameter for the sake of not breaking the current setup? (Although I don't know how painful it would be to change all the configs). It would need to be chosen whether to stick with changing the number of heads by:
- Altering the number of channels then the config e.g. this becomes:
"attention_head_dim": [
40,
80,
160,
160
]
or
- Following the stable diffusion code and altering by directly changing the number of heads, then the config becomes:
"num_attention_heads": 8
If all configs have _diffusers_version
then it could potentially be made backwards compatible by "incorrectly" using the config (but in the expected way) if the version is old?
Solving it here: https://github.com/huggingface/diffusers/pull/3797/files
Thanks for the super clear problem description @samb-t. We sadly cannot change all configs (we have 50,000 configs on the Hub and there are many more that are not on the Hub). Just changing the naming is simply too backwards breaking.
I think #3797 is a good way to prevent breaking everything while at the same time fixing the naming problem (we just let num_attention_heads
default to attention_head_dim
. This means that the config input attention_head_dim
will at first become useless when someone defines num_attention_heads
in the config, but this is OK for now I think.
Wdyt?
Also see: https://github.com/huggingface/diffusers/pull/3797/files#r1230915121
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
请问,出现如下提示:Error occurred when executing DownloadAndLoadKolorsModel:
At the moment it is not possible to define the number of attention heads via num_attention_heads
because of a naming issue as described in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131. Passing num_attention_heads
will only be supported in diffusers v0.19.
File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-KwaiKolorsWrapper\nodes.py", line 80, in loadmodel unet = UNet2DConditionModel.from_pretrained(model_path, subfolder= 'unet', variant="fp16", revision=None, low_cpu_mem_usage=True).to(dtype).eval() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained model = cls.from_config(config, **unused_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\configuration_utils.py", line 260, in from_config model = cls(**init_dict) ^^^^^^^^^^^^^^^^ File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\configuration_utils.py", line 658, in inner_init init(self, *args, **init_kwargs) File "E:\Tools\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\unets\unet_2d_condition.py", line 231, in init raise ValueError(
怎么解决?
you can use attention_head_dim
instead https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/unet/config.json#L8