ALBERT-Pytorch out of memory error

I'm running classify on the MRPC dataset. In trainer.train trainer.train(get_loss,model_file,True), it allows only three arguments not 4 so I cant use the pretrain file.

Also it runs out of memory, return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 4.00 GiB total capacity; 3.02 GiB already allocated; 43.35 MiB free; 223.00 KiB cached) Iter (loss=X.XXX): 0%| | 0/115 [00:00<?, ?it/s]

Please help.

I'm using cfg.hidden instead of cfg.dim and a drop out probability of 0.5

Oct 18 '19 21:10 csharma

@csharma What is your limitation of memory size?

Oct 20 '19 13:10 graykode

4Gb of CUDA memory. Best regards, Cartik Sharma "There is plenty of room at the bottom." - Richard Feynman

On Sunday, October 20, 2019, 09:52:28 a.m. EDT, Tae-Hwan Jung <[email protected]> wrote:

@csharma What is your limitation of memory size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Oct 20 '19 15:10 csharma

Also, I'm using trainer.train(get_loss, model_file, True) with 3 params instead of 4. It doesn't take the pretrain_file as a parameter. Could you help me with this?

Oct 21 '19 14:10 csharma

Fixed this by using an earlier version of ALBERTA https://github.com/graykode/ALBERT-Pytorch/tree/revert-11-feature/fix_seg_pad_bug

It takes the pretrain_file now. I'm only getting an error based on cfg.dim size mismatch for embed.tok_embed1.weight: copying a param with shape torch.Size([30522, 24]) from checkpoint, the shape in current model is torch.Size([30522, 128]). size mismatch for embed.tok_embed2.weight: copying a param with shape torch.Size([64, 24]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for embed.tok_embed2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for embed.pos_embed.weight: copying a param with shape torch.Size([512, 64]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for embed.seg_embed.weight: copying a param with shape torch.Size([2, 64]) from checkpoint, the shape in current model is torch.Size([2, 768]). size mismatch for embed.norm.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for embed.norm.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_q.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_q.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_k.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_k.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_v.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_v.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm1.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm1.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pwff.fc1.weight: copying a param with shape torch.Size([256, 64]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pwff.fc1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pwff.fc2.weight: copying a param with shape torch.Size([64, 256]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pwff.fc2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm2.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm2.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]).

cfg.hidden which is passed for cfg.dim is 768.

best regards, Cartik

Oct 21 '19 15:10 csharma