transformers
transformers copied to clipboard
input_ids_seq_length is always 1
System Info
-
transformers
version: 4.26.1 - Platform: Linux-5.4.0-113-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.12.1
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Tensorflow version (GPU?): 2.8.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.5.0 (gpu)
- Jax version: 0.3.13
- JaxLib version: 0.3.10
- Using GPU in script?: yes
I am trying to generate output that is equal in length to the input (partially to avoid hallucinations and repetitions). In src/transformers/generation/utils.py I read how input length is determined: If self.config.is_encoder_decoder (which is the case for me), input_ids_seq_length calculates the length of the input ids coming from _prepare_decoder_input_ids_for_generation, which makes a tensor with dimension (batch_size, 1) filled with start_tokens. This means the input_ids_seq_length is always 1, making it useless for determining the input length (and determining the output length based on that).
Who can help?
@sgugger @muellerzr
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
The problem arises in a script of my own, but this example also highlights it: (the task I am working on is not summarization but grammar correction, thats why I want the input length to be equal to the output length)
from transformers import AutoTokenizer, T5ForConditionalGeneration, GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
config = GenerationConfig(max_new_tokens=0)
input_ids = tokenizer("summarize: My friends are cool but they eat too many carbs.", return_tensors="pt").input_ids
outputs = model.generate(input_ids, generation_config=config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Expected behavior
I would expect the output length to be determined by the input length + max_new_tokens: generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
This is the case, but input_ids_seq_length is (wrongfully) always 1, making the output length independent of the input and equal to max_new_tokens+1.
cc @gante
Hey @ChrisSpraaklab 👋 In both types of models, input_ids_seq_length
is relative to the output of the model, which is different for encoder-decoder (does not contain the prompt) and decoder-only models (contains the prompt). I agree that we might benefit from a rework there, for clarity :)
In any case, let's sort out your immediate issue! As the argument indicates, max_new_tokens
will make the model generate up to max_new_tokens
new tokens. As such, if you want to generate an output equal to the input, you'll have to set max_new_tokens=input_ids.shape[1]
.
Also, bear in mind that encoder-decoder models ALWAYS start the output with a BOS token. As such, the length of the output will be the length of the input + 1.
@gante Thanks for your quick response. However, what I mean is that when input_ids_seq_length is set to input_ids.shape[-1], this value is always equal to 1 (as it comes from _prepare_decoder_input_ids_for_generation).
# 5. Prepare `input_ids` which will be used for auto-regressive generation
if self.config.is_encoder_decoder:
input_ids = self._prepare_decoder_input_ids_for_generation(
batch_size,
decoder_start_token_id=generation_config.decoder_start_token_id,
bos_token_id=generation_config.bos_token_id,
model_kwargs=model_kwargs,
device=inputs_tensor.device,
)
else:
input_ids = inputs_tensor if model_input_name == "input_ids" else model_kwargs.pop("input_ids")
# 6. Prepare `max_length` depending on other stopping criteria.
input_ids_seq_length = input_ids.shape[-1]
has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
if has_default_max_length and generation_config.max_new_tokens is None:
warnings.warn(
f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
"This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
" recommend using `max_new_tokens` to control the maximum length of the generation.",
UserWarning,
)
elif generation_config.max_new_tokens is not None:
generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
if not has_default_max_length:
logger.warn(
f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
"Please refer to the documentation for more information. "
"(https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)",
UserWarning,
)
In my understanding, doing as you suggested would make this line equivalent to 1+1, as max_new_tokens=input_ids.shape[1]
(equal to 1) and input_ids_seq_length = input_ids.shape[-1]
(equal to 1)
generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
@ChrisSpraaklab inside generate, in encoder-decoder models like T5, input_ids
is related to the decoder input ids. They are not the same as the input_ids
you feed to .generate()
, which will be used inside the encoder. Sadly, because .generate()
is used with many types of models, we have this naming clash :)
Have you tried running
from transformers import AutoTokenizer, T5ForConditionalGeneration, GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
input_ids = tokenizer("summarize: My friends are cool but they eat too many carbs.", return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=input_ids.shape[1])
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
?
Thanks! Your solution does indeed produce the result I was looking for. I was just quite confused about the naming convention and documentation around max_new_tokens. I was under the impression that its value would be added to the length of in the input of the encoder, not the decoder. However, I now understand why it doesn't behave as I expected it to.
So... despite that we input a token sequence input_ids
in the generate()
function, the length of this is irrelevant in the encoder-decoder model, and the max_new_tokens
in generate()
only refers to the length of the decoder input, which, because of BOS, is always 1 in our case. Yes, this is somewhat confusing indeed.
Are there ways to motivate generate()
to be more concise, but still run until EOS is generated, e.g., by setting a prior on the EOS?
Hey @davidavdav -- yeah, you can try using Beam Search (i.e. num_beams>1
) and pass a NEGATIVE length_penalty
. This will nudge the output towards shorter outputs!
BTW, if you come across better variable names, by all means, please suggest them :) We have so many features on our to-do list (including better docs) that every little help is precious!
Ah thanks, @gante---I do appreciate the difficulty of choosing sensible parameter/variable names, the number of times I am refactoring names back and forth in my own code is quite scary!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.