ALBERT-Pytorch
ALBERT-Pytorch copied to clipboard
out of memory error
I'm running classify on the MRPC dataset. In trainer.train trainer.train(get_loss,model_file,True), it allows only three arguments not 4 so I cant use the pretrain file.
Also it runs out of memory, return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 4.00 GiB total capacity; 3.02 GiB already allocated; 43.35 MiB free; 223.00 KiB cached) Iter (loss=X.XXX): 0%| | 0/115 [00:00<?, ?it/s]
Please help.
I'm using cfg.hidden instead of cfg.dim and a drop out probability of 0.5
@csharma What is your limitation of memory size?
4Gb of CUDA memory. Best regards, Cartik Sharma "There is plenty of room at the bottom." - Richard Feynman
On Sunday, October 20, 2019, 09:52:28 a.m. EDT, Tae-Hwan Jung <[email protected]> wrote:
@csharma What is your limitation of memory size?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Also, I'm using trainer.train(get_loss, model_file, True) with 3 params instead of 4. It doesn't take the pretrain_file as a parameter. Could you help me with this?
Fixed this by using an earlier version of ALBERTA https://github.com/graykode/ALBERT-Pytorch/tree/revert-11-feature/fix_seg_pad_bug
It takes the pretrain_file now. I'm only getting an error based on cfg.dim size mismatch for embed.tok_embed1.weight: copying a param with shape torch.Size([30522, 24]) from checkpoint, the shape in current model is torch.Size([30522, 128]). size mismatch for embed.tok_embed2.weight: copying a param with shape torch.Size([64, 24]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for embed.tok_embed2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for embed.pos_embed.weight: copying a param with shape torch.Size([512, 64]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for embed.seg_embed.weight: copying a param with shape torch.Size([2, 64]) from checkpoint, the shape in current model is torch.Size([2, 768]). size mismatch for embed.norm.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for embed.norm.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_q.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_q.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_k.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_k.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for attn.proj_v.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for attn.proj_v.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm1.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm1.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pwff.fc1.weight: copying a param with shape torch.Size([256, 64]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pwff.fc1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pwff.fc2.weight: copying a param with shape torch.Size([64, 256]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pwff.fc2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm2.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for norm2.beta: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([768]).
cfg.hidden which is passed for cfg.dim is 768.
best regards, Cartik