Using decoder_input_ids with Seq2SeqTrainer.predict()
Hi,
Is there a way to use decoder_input_ids in Seq2SeqTrainer.predict() as in model.generate()? The goal is to generate sentences with both the encoder input and decoder input to initialize the generation.
Thank you very much!
cc @sgugger
cc @gante
Hey @zhenduow 👋
This PR, which allows passing decoder_input_ids as part of the input to the Seq2SeqTrainer, was merged after the latest release (v4.28).
Could you try installing from main (pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)
Hey @zhenduow 👋
This PR, which allows passing
decoder_input_idsas part of the input to theSeq2SeqTrainer, was merged after the latest release (v4.28).Could you try installing from
main(pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)
Hi @gante ,
Thank you very much for the reply! I have checked the PR and I have a further question.
I pass the decoder_input_ids to model.generate() by inputs['decoder_input_ids'] within Seq2SeqTrainer, is that right?
By doing this, I need to batch the decoder_input_ids to a tensor, which requires padding or truncating my decoder_input_ids. However, my generation task has various length of decoder_input_ids, which causes error when batching decoder_input_ids into a tensor.
For example, my decoder_input_ids looks like:
[
[1,2,3],
[4,5],
[6]
]
It cannot create a tensor because the lengths of the three lists do not match.
Is there a way to solve this problem? Thank you very much!
@zhenduow you probably need to pad decoder_input_ids -- see this guide
BTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗
@zhenduow you probably need to pad
decoder_input_ids-- see this guideBTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗
Thank you! I should ask this in the forum.
Hey @zhenduow 👋
This PR, which allows passing
decoder_input_idsas part of the input to theSeq2SeqTrainer, was merged after the latest release (v4.28).Could you try installing from
main(pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)
@zhenduow you probably need to pad
decoder_input_ids-- see this guideBTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗
Thank you! I solved the tensor problem with padding and got results.
However, my results do not start with the decoder_input_ids. I want to double check in case this is a bug that:
Do I need to pass any additional argument to Seq2SeqTrainer (which will tell the decoder to start with the given ids) besides adding decoder_input_ids as a key in the dataset dictionary?
Try passing labels and decoder_input_ids: if my memory is correct, the former will be used to obtain the evaluation metrics, and the later as the prompt for the decoder
Try passing
labelsanddecoder_input_ids: if my memory is correct, the former will be used to obtain the evaluation metrics, and the later as the prompt for the decoder
Thank you for the suggestion!
I try to pass the decoder_input_ids to the forward function, but because I use trainer, I don't have control over the model() function. I only can add decoder_input_ids as a key in the model input dictionary. That does not seem to work.
I dive into the code and find that there is this line of code in the predict() in trainer.py:
https://github.com/huggingface/transformers/blob/15f260a82f98788354d55cb2788e9f0b5131fb77/src/transformers/trainer.py#LL3101C1-L3101C1
test_dataloader = self.get_test_dataloader(test_dataset)
This line of code changes my test_dataset['decoder_input_ids'] from my custom decoder prompts to shifted labels.
Can you please check if this is intended or a bug? Why is this the case?
I was not sure of the behavior, it seems my memory was incorrect :) Alternatively, this one will work for sure: you can set forced_decoder_ids (docs), which will force the tokens you specify in the position you define. You can use it to force a starting sequence, assuming it is the same for all members of the batch.
Thanks! Can you please explain how I can use forced_decoder_ids with trainer?
It seems like I cannot call the generate() function anywhere, only the model() function.
Can I use forced_decoder_ids with model()?
@zhenduow you can define a generation config (docs 1 docs 2) and pass it to the trainer (see here).
If you parameterize forced_decoder_ids in the generation config, it will be passed to .generate at evaluation time
@zhenduow you can define a generation config (docs 1 docs 2) and pass it to the trainer (see here).
If you parameterize
forced_decoder_idsin the generation config, it will be passed to.generateat evaluation time
I did as you suggested and printed:
print(trainer.model.generation_config)
, which shows me that
GenerationConfig {
"_from_model_config": true,
"decoder_start_token_id": 0,
"eos_token_id": 1,
"forced_decoder_ids": [
[
1,
123
]
],
"pad_token_id": 0,
"transformers_version": "4.29.0.dev0"
}
The [1,123] is for testing. However, the generation is still the same as before. Is there anything wrong here?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.