diffusion-nbs
diffusion-nbs copied to clipboard
Deep Dive NB: Quick Fix for AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'
In the Stable Diffusion Deep Dive notebook, in the code plot immediately following the Transformer diagram, there is the definition of get_output_embeds
which includes a call to text_encoder.text_model._build_causal_attention_mask
:
def get_output_embeds(input_embeddings):
# CLIP's text model uses causal mask, so we prepare it here:
bsz, seq_len = input_embeddings.shape[:2]
causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
...
That is currently generating an error for me when I run the notebook on Colab (from a fresh instance) or my home computer:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-33-dbb74b7ec9b4>](https://localhost:8080/#) in <cell line: 26>()
24 return output
25
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
27 print(out_embs_test.shape) # Check the output shape
28 out_embs_test # Inspect the output
1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
1616
AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'
Everything in the notebook prior to that line runs fine.
Perhaps I'm doing something wrong, or perhaps something has changed with the HF libraries that being used, since the notebook's original conception?
UPDATE:
I see the same issue here: https://github.com/drboog/ProFusion/issues/12. It seems that transformers
has changed. Downgrading to version 4.25.1 fixed the problem.
Thus changing the the pip install
line at the top of the notebook to
!pip install -q --upgrade transformers==4.25.1 diffusers ftfy
...will restore full functionality.
Feel free to close this issue at your convenience. Perhaps a PR is in order.
Presumably some way to keep up to date with transformers
will be preferable, but for now this is a quick fix.
@drscotthawley another fix, without having to downgrade, could be to use what the function _build_causal_attention_mask
used to have (from here)
def build_causal_attention_mask(bsz, seq_len, dtype):
# lazily create causal attention mask, with full attention between the vision tokens
# pytorch uses additive attention mask; fill with -inf
mask = torch.empty(bsz, seq_len, seq_len, dtype=dtype)
mask.fill_(torch.tensor(torch.finfo(dtype).min))
mask.triu_(1) # zero out the lower diagonal
mask = mask.unsqueeze(1) # expand mask
return mask
PS: Thanks for the mps support changes
import torch
def build_causal_attention_mask(bsz, seq_len, dtype):
mask = torch.empty(bsz, seq_len, seq_len, dtype=dtype)
mask.fill_(torch.tensor(torch.finfo(dtype).min)) # fill with large negative number (acts like -inf)
mask = mask.triu_(1) # zero out the lower diagonal to enforce causality
return mask.unsqueeze(1) # add a batch dimension
# Update your function call to use the new mask function
def get_output_embeds(input_embeddings):
bsz, seq_len = input_embeddings.shape[:2]
causal_attention_mask = build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
# Getting the output embeddings involves calling the model with passing output_hidden_states=True
# so that it doesn't just return the pooled final predictions:
encoder_outputs = text_encoder.text_model.encoder(
inputs_embeds=input_embeddings,
attention_mask=None, # We aren't using an attention mask so that can be None
causal_attention_mask=causal_attention_mask.to(torch_device),
output_attentions=None,
output_hidden_states=True, # We want the output embs not the final output
return_dict=None,
)
# We're interested in the output hidden state only
output = encoder_outputs[0]
# There is a final layer norm we need to pass these through
output = text_encoder.text_model.final_layer_norm(output)
# And now they're ready!
return output
out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
print(out_embs_test.shape) # Check the output shape
out_embs_test # Inspect the output
if you install the new transformers library like version of 4.40.2,you can implement as below:
from transformers.modeling_attn_mask_utils import _create_4d_casual_attention_mask
def get_output_embeds(input_embeddings):
input_ids = (bsz, seq_len) = input_embeddings.shape[:2]
causal_attention_mask = _create_4d_casual_attention_mask(input_ids, dtype=input_embeddings.dtype, device=torch_device)