optimum BetterTransformer.transform fails for OPT variants of BLIP2

System Info

Package version: optimum 1.11.0
System: Google Colab (Linux-5.15.109+-x86_64-with-glibc2.35) 
Python version:  3.10.12

transformers-cli env info:

- `transformers` version: 4.31.0
- Platform: Linux-5.15.109+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.2
- Accelerate version: 0.21.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.0.1+cu118 (False)
- Tensorflow version (GPU?): 2.12.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.7.1 (cpu)
- Jax version: 0.4.14
- JaxLib version: 0.4.14
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No



### Who can help?

@younesbelkada 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

BLIP2 conversion seems to fail for OPT based checkpoints. 

1. Load Transformers model

from transformers import Blip2ForConditionalGeneration

model_hf = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b")


2. Attempt to convert using BetterTransformer `transform` method.

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model_hf)


Raises the following exception:

Exception Traceback (most recent call last)

in <cell line: 3>() 1 from optimum.bettertransformer import BetterTransformer 2 ----> 3 model = BetterTransformer.transform(model_hf)

2 frames

/usr/lib/python3.10/contextlib.py in inner(*args, **kwds) 77 def inner(*args, **kwds): 78 with self._recreate_cm(): ---> 79 return func(*args, **kwds) 80 return inner 81

/usr/local/lib/python3.10/dist-packages/optimum/bettertransformer/transformation.py in transform(model, keep_original_model, max_memory, offload_dir, **kwargs) 264 265 if BetterTransformerManager.requires_nested_tensor(model_fast.config.model_type): --> 266 set_last_layer(model_fast) 267 268 # Add a class arguments, we might need to identify whether the model

/usr/local/lib/python3.10/dist-packages/optimum/bettertransformer/transformation.py in set_last_layer(model) 164 return 165 --> 166 raise Exception( 167 f"The transformation of the model {model.class.name} to BetterTransformer failed while it should not. Please fill" 168 " a bug report or open a PR to support this model at https://github.com/huggingface/optimum/"

Exception: The transformation of the model Blip2ForConditionalGeneration to BetterTransformer failed while it should not. Please fill a bug report or open a PR to support this model at https://github.com/huggingface/optimum/




### Expected behavior

I am unsure if all variants of BLIP2 are supposed to be supported. It is included in the list of supported models [here](https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models) but possibly this is only intended to cover FLAN. 

I haven't looked into the Optimum source code super closely, but it seems that BLIP2 assumes FLAN in the model mapping [here](https://github.com/huggingface/optimum/blob/94bf76698638e8a5ef48392e5951375d09e72183/optimum/bettertransformer/models/__init__.py#L62C12-L62C12).

Aug 09 '23 08:08 davanstrien

Thank you for the report @davanstrien, I can have a look! cc @baskrahmer as I think you added it.

Aug 11 '23 15:08 fxmarty

This script should reproduce the error:

from transformers import Blip2ForConditionalGeneration
from optimum.bettertransformer import BetterTransformer

model_hf = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b")
model = BetterTransformer.transform(model_hf)

I have no hardware available right now to run a model this large, but based on the code I think @davanstrien is right that my implementation only covers the FlanT5 backend. If a Blip2 model with the OPT backend could be added to hf-internal-testing, then I can try to support it.

Code-wise you probably need to add {"OPTAttention": OPTAttentionLayerBetterTransformer} to this dict entry.

Aug 13 '23 09:08 baskrahmer

This script should reproduce the error:
from transformers import Blip2ForConditionalGeneration
from optimum.bettertransformer import BetterTransformer

model_hf = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b")
model = BetterTransformer.transform(model_hf)
I have no hardware available right now to run a model this large, but based on the code I think @davanstrien is right that my implementation only covers the FlanT5 backend. If a Blip2 model with the OPT backend could be added to hf-internal-testing, then I can try to support it.

Code-wise you probably need to add {"OPTAttention": OPTAttentionLayerBetterTransformer} to this dict entry.

Got the same issue, after added {"OPTAttention": OPTAttentionLayerBetterTransformer} to the dict as specified by @baskrahmer still got the same error...

The transformation of the model Blip2ForConditionalGeneration to BetterTransformer failed while it should not. Please fill a bug report or open a PR to support this model at https://github.com/huggingface/optimum/

Part of the code I was using for loading & performing mdel transform:

model = Blip2ForConditionalGeneration.from_pretrained(
                local_model_path,
                torch_dtype=torch.float16 if cpu_only is False else torch.float32,
            )
if cpu_only is True:
                model = BetterTransformer.transform(model, keep_original_model=True)

Oct 23 '23 02:10 fengzhyuan

@fengzhyuan see https://github.com/huggingface/optimum/pull/1488 for a working fix.

Oct 26 '23 16:10 baskrahmer

optimum optimum copied to clipboard

BetterTransformer.transform fails for OPT variants of BLIP2

System Info

optimum
optimum copied to clipboard