zaidato
zaidato
@kadirnar No, I didn't load a pretrained model. You mean your total batch_size=16 (for 8 gpus) or each gpu has batch_size=16 (total batch size = 16*8)?
You got an error at epoch 50. I think it's because you set TMA_epoch: 50 # TMA starting epoch (1st stage). You need to decrease batch size to fix that...
I faced the same problem when using FP16
@kadirnar What does context length mean? In your repo, you set batch_size: 64 and max_len: 560. How can you increase these values without getting out of memory?