Generating_Text_Summary_With_GPT2 icon indicating copy to clipboard operation
Generating_Text_Summary_With_GPT2 copied to clipboard

About batch_size

Open duongkstn opened this issue 3 years ago • 3 comments

Hi, your notebook is an impressive tutorial about GPT-2 for seq2seq model. But it can not run with batch_size greater than 1. I think it is due to you added sum_idx in training dataset. In more details, you wrote shift_labels = labels[..., batch['sum_idx']+1:].contiguous() but it can not be applied for a batch since PyTorch does not enable tensor slicing by another tensor like this.

I dont know why you chose batch_size=1, but I can run greater batch_size in my NVIDIA 12GB GPU. To do that, I think you should define a new feature for your dataset, say labels and let some values of labels equals to -100. And you should remove sum_idx feature since it is really hard to slice a tensor. By assigning some values as -100, PyTorch's cross entroy loss will ignore those indices. Check out ignore_index for more details

duongkstn avatar May 27 '22 04:05 duongkstn

Did you have any solution for this?

hoangthangta avatar Jun 08 '22 10:06 hoangthangta

I choose batch size 1 because I was only able to train with batch size of 1 on the GPU I had access to. Since, I am busy right now couldn't work on it. If you have found a solution please send a patch, I will review and merge.

SKRohit avatar Jan 14 '23 08:01 SKRohit

Can you review the following code for loss calculation when batch size > 1?

idx = batch['sum_idx'] b = logits.shape[0] loss = 0 for i in range(0,b): shift_logits = logits[i, idx[i]:-1, :].contiguous() shift_labels = labels[i, idx[i] + 1:].contiguous() loss += loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) loss = loss/b loss = loss / args.gradient_accumulation_steps loss.backward()

Rest follows the original code. The gradient_accumulation_steps should be lowered in proportion to the batch size

ankur6ue avatar Apr 07 '23 14:04 ankur6ue