transformers
transformers copied to clipboard
BART generate with min_new_tokens exceeds maximum length
System Info
-
transformers
version: 4.40.2 - Platform: Linux-4.18.0-477.36.1.el8_8.x86_64-x86_64-with-glibc2.28
- Python version: 3.10.14
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4.3
- Accelerate version: 0.30.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
@ArthurZucker @younesbelkada @gante
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
If I load a fine-tuned BARTforConditionalGeneration
model and then try to generate text with it, I run into the following error: This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
Generation code:
outputs = model.generate(input_ids, attention_mask=attention_mask, num_beams=3,
min_new_tokens=1500,
max_new_tokens=2500,
# stopping_criteria=stopping_criteria,
early_stopping=True)
I was under the impression that since the BART decoder generates autoregressively, there was no limit to its generation?
Expected behavior
Generation without a CUDA or out-of-bounds error with arbitrary length.