transformers Using decoder_input_ids with Seq2SeqTrainer.predict()

Hi,

Is there a way to use decoder_input_ids in Seq2SeqTrainer.predict() as in model.generate()? The goal is to generate sentences with both the encoder input and decoder input to initialize the generation.

Thank you very much!

Apr 21 '23 20:04 zhenduow

cc @sgugger

Apr 24 '23 12:04 amyeroberts

cc @gante

Apr 24 '23 13:04 sgugger

Hey @zhenduow 👋

This PR, which allows passing decoder_input_ids as part of the input to the Seq2SeqTrainer, was merged after the latest release (v4.28).

Could you try installing from main (pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)

May 01 '23 11:05 gante

Hey @zhenduow 👋

This PR, which allows passing decoder_input_ids as part of the input to the Seq2SeqTrainer, was merged after the latest release (v4.28).

Could you try installing from main (pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)

Hi @gante ,

Thank you very much for the reply! I have checked the PR and I have a further question. I pass the decoder_input_ids to model.generate() by inputs['decoder_input_ids'] within Seq2SeqTrainer, is that right? By doing this, I need to batch the decoder_input_ids to a tensor, which requires padding or truncating my decoder_input_ids. However, my generation task has various length of decoder_input_ids, which causes error when batching decoder_input_ids into a tensor. For example, my decoder_input_ids looks like: [ [1,2,3], [4,5], [6] ] It cannot create a tensor because the lengths of the three lists do not match. Is there a way to solve this problem? Thank you very much!

May 09 '23 05:05 zhenduow

@zhenduow you probably need to pad decoder_input_ids -- see this guide

BTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗

May 09 '23 11:05 gante

@zhenduow you probably need to pad decoder_input_ids -- see this guide

BTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗

Thank you! I should ask this in the forum.

May 09 '23 17:05 zhenduow

Hey @zhenduow 👋

This PR, which allows passing decoder_input_ids as part of the input to the Seq2SeqTrainer, was merged after the latest release (v4.28).

Could you try installing from main (pip install --upgrade git+https://github.com/huggingface/transformers.git), and check whether it works correctly on your use case? :)

@zhenduow you probably need to pad decoder_input_ids -- see this guide

BTW, as per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗

Thank you! I solved the tensor problem with padding and got results. However, my results do not start with the decoder_input_ids. I want to double check in case this is a bug that: Do I need to pass any additional argument to Seq2SeqTrainer (which will tell the decoder to start with the given ids) besides adding decoder_input_ids as a key in the dataset dictionary?

May 10 '23 04:05 zhenduow

Try passing labels and decoder_input_ids: if my memory is correct, the former will be used to obtain the evaluation metrics, and the later as the prompt for the decoder

May 10 '23 13:05 gante

Try passing labels and decoder_input_ids: if my memory is correct, the former will be used to obtain the evaluation metrics, and the later as the prompt for the decoder

Thank you for the suggestion!

I try to pass the decoder_input_ids to the forward function, but because I use trainer, I don't have control over the model() function. I only can add decoder_input_ids as a key in the model input dictionary. That does not seem to work.

I dive into the code and find that there is this line of code in the predict() in trainer.py:

https://github.com/huggingface/transformers/blob/15f260a82f98788354d55cb2788e9f0b5131fb77/src/transformers/trainer.py#LL3101C1-L3101C1

test_dataloader = self.get_test_dataloader(test_dataset)

This line of code changes my test_dataset['decoder_input_ids'] from my custom decoder prompts to shifted labels.

Can you please check if this is intended or a bug? Why is this the case?

May 11 '23 04:05 zhenduow

I was not sure of the behavior, it seems my memory was incorrect :) Alternatively, this one will work for sure: you can set forced_decoder_ids (docs), which will force the tokens you specify in the position you define. You can use it to force a starting sequence, assuming it is the same for all members of the batch.

May 11 '23 08:05 gante

Thanks! Can you please explain how I can use forced_decoder_ids with trainer? It seems like I cannot call the generate() function anywhere, only the model() function. Can I use forced_decoder_ids with model()?

May 12 '23 06:05 zhenduow

@zhenduow you can define a generation config (docs 1 docs 2) and pass it to the trainer (see here).

If you parameterize forced_decoder_ids in the generation config, it will be passed to .generate at evaluation time

May 12 '23 15:05 gante

@zhenduow you can define a generation config (docs 1 docs 2) and pass it to the trainer (see here).

If you parameterize forced_decoder_ids in the generation config, it will be passed to .generate at evaluation time

I did as you suggested and printed: print(trainer.model.generation_config) , which shows me that

GenerationConfig {
  "_from_model_config": true,
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "forced_decoder_ids": [
    [
      1,
      123
    ]
  ],
  "pad_token_id": 0,
  "transformers_version": "4.29.0.dev0"
}

The [1,123] is for testing. However, the generation is still the same as before. Is there anything wrong here?

May 12 '23 19:05 zhenduow

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 06 '23 15:06 github-actions[bot]