aitextgen
aitextgen copied to clipboard
Colab OOM Finetuning GPT-Neo both 125M and 350M on both T4 and P100
Using Colab:
I get OOM Finetuning GPT-Neo both 125M and 350M on both T4 and P100. Even when I enable fp16
this problem persists.
GPT2 works fine on the other hand.
/usr/local/lib/python3.7/dist-packages/transformers/models/gpt_neo/modeling_gpt_neo.py in _attn(self, query, key, value, causal_mask, masked_bias, attn_dropout, attention_mask, head_mask)
235
236 attn_weights = torch.matmul(query, key.transpose(-1, -2))
--> 237 attn_weights = torch.where(causal_mask, attn_weights, masked_bias.to(attn_weights.dtype))
238
239 if attention_mask is not None:
RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 61.75 MiB free; 14.96 GiB reserved in total by PyTorch)
That's weird. Are you changing any other training settings?
That's weird. Are you changing any other training settings?
everything else is defaults. i tried again using a fresh copy of your notebook and 125M now works, but350M still OOMs.
Having this issue too. If it matters, I'm using a pretty large text file (~20 MB) as the dataset, and I'm also getting this warning a short while after training starts:
Token indices sequence length is longer than the specified maximum sequence length for this model (2385 > 2048). Running this sequence through the model will result in indexing errors
This also happened in my attempts to train GPT-Neo locally, so it doesn't seem like it's endemic to Colab.
Having this issue too. If it matters, I'm using a pretty large text file (~20 MB) as the dataset, and I'm also getting this warning a short while after training starts:
Token indices sequence length is longer than the specified maximum sequence length for this model (2385 > 2048). Running this sequence through the model will result in indexing errors
This also happened in my attempts to train GPT-Neo locally, so it doesn't seem like it's endemic to Colab.
That error just looks like one of your training samples has a token count that is too large, not the same as a GPU OOM. I recommend using the tokenizer function to find whatever sequence is causing that.
Alright, I'll check that out, but I am also definitely OOMing