transformers-tutorials
transformers-tutorials copied to clipboard
T5 fine-tuning for summarization decoder_input_ids and labels
hello @abhimishra91
i was trying to implement the fine tuning of T5 as explained in your notebook.
in addition to have implemented the same structure as you, i have made some experiments with the HuggingFace Trainer class. the decoder_input_ids and labels parameters are not very clear to me. when you train the model, you do this
y = data['target_ids'].to(device, dtype = torch.long)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
where y_ids is the decoder_input_ids. i don't understand why we need these preprocessing. i kindly ask you why are you skipping the last token of the target_ids, and why are you replacing the pads with -100 in the labels?
when i use the HuggingFace Trainer i need to tweak the __getitem__ function of the DataLoader like this
def __getitem__(self, idx):
...
item['decoder_input_ids'] = y[:-1]
lbl = y[:-1].clone()
lbl[y[1:] == self.tokenizer.pad_token_id] = -100
item['labels'] = lbl
return item
otherwise the loss function does not decrease over time.
thank you for your help!
Hi, @marcoabrate!
I am also having trouble calculating loss. Can you share the full code for your training? Have you used multiGPU?
Hi @Gorodecki I have abandoned this code since there are a lot of seq2seq training and testing examples in the HuggingFace library itself, you can check them out here: https://github.com/huggingface/transformers/tree/master/examples/seq2seq I was not using multiGPU. Hope this help!
@Gorodecki , @marcoabrate : so far I have found this one useful: https://github.com/huggingface/notebooks/blob/master/examples/summarization.ipynb