transformers CodeGenAttention does not work with defaults in forward pass

System Info

transformers version: 4.28.1
Platform: Linux-5.15.0-1034-azure-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.13.4
Safetensors version: not installed
PyTorch version (GPU?): 2.0.0a0+1767026 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

The current version of models.codegen.modeling_codegen.CodeGenAttention forward throws error on line 193 when position_ids are not specified and defaults to None. This can be fixed by defining default position_ids as self.position_ids in the init. The issue was an issue introduced in commit 4e94c6c.

@ArthurZucker @younesbelkada

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

import torch
from transformers.models.codegen.modeling_codegen import CodeGenAttention
from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained("Salesforce/codegen-350M-nl")
model = CodeGenAttention(config)
x= torch.randn(4, config.n_ctx, config.n_embd)
model(x)

Expected behavior

The block should instantiate the codegenattention with default position ids as torch.arange(seq_offset:seqlen)

Apr 18 '23 23:04 sgunasekar

Hi @sgunasekar Thanks for the issue As per my understanding, since the class CodeGenAttention is not a public class, it should be only used by CodeGenModel. In the modeling script if position ids is None we indeed manually create position_ids based on past length and sequence length. I personally don't think we should do this inside CodeGenAttention, but if you want to use that class as a standalone class you should manually create position ids and pass it in the forward pass. I also want to hear from @ArthurZucker, @sgugger @amyeroberts to make sure we are aligned on this

Apr 25 '23 09:04 younesbelkada

👍🏻 on @younesbelkada 's answer, on almost all of our attention modules, the attention should be passed and there are not reason to give them a default value because this is handled in the modelling file.

Apr 25 '23 10:04 ArthurZucker

Completely in line with the comments above.

Apr 25 '23 12:04 sgugger

+1 - As @younesbelkada says, CodeGenAttention isn't a public class and this is easily resolved by passing in position_ids directly to the layer.

Apr 25 '23 17:04 amyeroberts

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 20 '23 15:05 github-actions[bot]

transformers transformers copied to clipboard

CodeGenAttention does not work with defaults in forward pass

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard