simple-faster-rcnn-pytorch out of memory 训练的时候显存一直在增长

out of memory 训练的时候显存一直在增长

Open songhat opened this issue 2 years ago • 4 comments

已经尝试的方法： loss.item()没问题 dataloader加载数据也没有增长数据。

Feb 23 '23 13:02 songhat

后面有人提到了，train.py第76行，这两个顺序不对的话好像是会造成显存泄露 change

    for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):

    for ii, (img, bbox_, label_, scale) in enumerate(tqdm(dataloader)):

Mar 21 '23 09:03 deepxzy

@deepxzy hi！感谢你的回答，我尝试你的方案，但是不work!

Mar 21 '23 14:03 songhat

我有类似的训练时内存不断增加的问题，调试之后发现是eval阶段内存占用会不断增大

Aug 10 '23 07:08 fatejzz

I train on nvidia pytorch docker and also have this problem. Try not to use the pin_memory resolve this problem. on train.py test_dataloader = data_.DataLoader(testset, batch_size=1, num_workers=opt.test_num_workers, shuffle=False, pin_memory=False )

May 12 '24 03:05 hungphandinh92it

simple-faster-rcnn-pytorch simple-faster-rcnn-pytorch copied to clipboard

out of memory 训练的时候显存一直在增长

simple-faster-rcnn-pytorch
simple-faster-rcnn-pytorch copied to clipboard