zaidato comments

Repositories
Issues
Comments

Results 4 comments of


                                            zaidato

First step training very slow and high GPU memory

@kadirnar No, I didn't load a pretrained model. You mean your total batch_size=16 (for 8 gpus) or each gpu has batch_size=16 (total batch size = 16*8)?

First step training very slow and high GPU memory

You got an error at epoch 50. I think it's because you set TMA_epoch: 50 # TMA starting epoch (1st stage). You need to decrease batch size to fix that...

First step training very slow and high GPU memory

I faced the same problem when using FP16

First step training very slow and high GPU memory

@kadirnar What does context length mean? In your repo, you set batch_size: 64 and max_len: 560. How can you increase these values without getting out of memory?