srntt-pytorch icon indicating copy to clipboard operation
srntt-pytorch copied to clipboard

When you run the train.py.How much memory have you cost?

Open 6ABCD0 opened this issue 3 years ago • 7 comments

I trained it on TITAN XP(12G), But it occured some mistakes about 'out of memory'. And I found its memory cost increased with the number of forwarding time increased(In other words, when I run "python3 train.py --use_weights --netG_pre ./pretrain_model/netG_100.pth --netD_pre ./pretrain_model/netD_100.pth", the memory cost is 5G at begin, but with the code running, the memory cost increased to 12G, and finally,it exceeded 12G ) @S-aiueo32 screenshot of mistake

6ABCD0 avatar Nov 14 '20 13:11 6ABCD0

hmm... What version of your PyTorch? Depending on the version, some memory leaking may be led.

S-aiueo32 avatar Nov 14 '20 14:11 S-aiueo32

My torch version is pytorch1.7

6ABCD0 avatar Nov 14 '20 14:11 6ABCD0

I used Pytorch 1.3 when I worked on this project. If you can, try to run it on pipenv, which will reproduce my environment. I have never faced the issue throughout over 100 epoch training.

S-aiueo32 avatar Nov 14 '20 14:11 S-aiueo32

thank you~I will try it again

6ABCD0 avatar Nov 14 '20 14:11 6ABCD0

@WangduoXie I had the same issue and this fixed it for me: https://discuss.pytorch.org/t/memory-leak-with-wgan-gp-loss/112117

tsogkas avatar May 14 '21 17:05 tsogkas

Thanks for your notification~ Best! Wangduo

------------------ 原始邮件 ------------------ 发件人: "S-aiueo32/srntt-pytorch" @.>; 发送时间: 2021年5月15日(星期六) 凌晨1:12 @.>; @.@.>; 主题: Re: [S-aiueo32/srntt-pytorch] When you run the train.py.How much memory have you cost? (#15)

@WangduoXie I had the same issue and this fixed it for me: https://discuss.pytorch.org/t/memory-leak-with-wgan-gp-loss/112117

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

6ABCD0 avatar May 15 '21 01:05 6ABCD0

Just delete "torch.autograd.set_detect_anomaly(True)" in train.py and then it works.

HITRainer avatar Nov 08 '21 09:11 HITRainer