transformers
transformers copied to clipboard
CodeGenAttention does not work with defaults in forward pass
System Info
-
transformers
version: 4.28.1 - Platform: Linux-5.15.0-1034-azure-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.13.4
- Safetensors version: not installed
- PyTorch version (GPU?): 2.0.0a0+1767026 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
The current version of models.codegen.modeling_codegen.CodeGenAttention forward throws error on line 193 when position_ids are not specified and defaults to None. This can be fixed by defining default position_ids as self.position_ids in the init. The issue was an issue introduced in commit 4e94c6c.
@ArthurZucker @younesbelkada
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
import torch
from transformers.models.codegen.modeling_codegen import CodeGenAttention
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("Salesforce/codegen-350M-nl")
model = CodeGenAttention(config)
x= torch.randn(4, config.n_ctx, config.n_embd)
model(x)
Expected behavior
The block should instantiate the codegenattention with default position ids as torch.arange(seq_offset:seqlen)
Hi @sgunasekar
Thanks for the issue
As per my understanding, since the class CodeGenAttention
is not a public class, it should be only used by CodeGenModel
. In the modeling script if position ids is None
we indeed manually create position_ids
based on past length and sequence length. I personally don't think we should do this inside CodeGenAttention
, but if you want to use that class as a standalone class you should manually create position ids and pass it in the forward pass.
I also want to hear from @ArthurZucker, @sgugger @amyeroberts to make sure we are aligned on this
👍🏻 on @younesbelkada 's answer, on almost all of our attention modules, the attention should be passed and there are not reason to give them a default value because this is handled in the modelling file.
Completely in line with the comments above.
+1 - As @younesbelkada says, CodeGenAttention
isn't a public class and this is easily resolved by passing in position_ids
directly to the layer.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.