Generating_Text_Summary_With_GPT2
Generating_Text_Summary_With_GPT2 copied to clipboard
About batch_size
Hi, your notebook is an impressive tutorial about GPT-2 for seq2seq model.
But it can not run with batch_size greater than 1. I think it is due to you added sum_idx in training dataset. In more details, you wrote shift_labels = labels[..., batch['sum_idx']+1:].contiguous() but it can not be applied for a batch since PyTorch does not enable tensor slicing by another tensor like this.
I dont know why you chose batch_size=1, but I can run greater batch_size in my NVIDIA 12GB GPU. To do that, I think you should define a new feature for your dataset, say labels and let some values of labels equals to -100. And you should remove sum_idx feature since it is really hard to slice a tensor. By assigning some values as -100, PyTorch's cross entroy loss will ignore those indices. Check out ignore_index for more details
Did you have any solution for this?
I choose batch size 1 because I was only able to train with batch size of 1 on the GPU I had access to. Since, I am busy right now couldn't work on it. If you have found a solution please send a patch, I will review and merge.
Can you review the following code for loss calculation when batch size > 1?
idx = batch['sum_idx'] b = logits.shape[0] loss = 0 for i in range(0,b): shift_logits = logits[i, idx[i]:-1, :].contiguous() shift_labels = labels[i, idx[i] + 1:].contiguous() loss += loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) loss = loss/b loss = loss / args.gradient_accumulation_steps loss.backward()
Rest follows the original code. The gradient_accumulation_steps should be lowered in proportion to the batch size