transformers
transformers copied to clipboard
Fix Bug: Gemma2 the past_key_value.update() function has added a new parameter "sliding_window" to support the _sliding_update function.
What does this PR do?
System Info transformers 4.42.3
Now gemma2 model generates long text that exceeds the window size (>4096), it will report a CUDA error, which seems to be a problem with the failure of the _sliding_update function in HybridCache. The error is as follows:
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-9b-it",
device_map="auto",
torch_dtype=torch.bfloat16
)
input_text = "Write me a poem about Machine Learning." * 800
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids,max_new_tokens = 1150)
print(tokenizer.decode(outputs[0]))
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [] Did you write any new necessary tests?
Who can review?
@ArthurZucker
Could you take a look when you have a minute @sanchit-gandhi ?