simple-faster-rcnn-pytorch icon indicating copy to clipboard operation
simple-faster-rcnn-pytorch copied to clipboard

out of memory 训练的时候显存一直在增长

Open songhat opened this issue 2 years ago • 4 comments

已经尝试的方法: loss.item()没问题 dataloader加载数据也没有增长数据。

songhat avatar Feb 23 '23 13:02 songhat

后面有人提到了,train.py第76行,这两个顺序不对的话好像是会造成显存泄露 change

    for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):

to

    for ii, (img, bbox_, label_, scale) in enumerate(tqdm(dataloader)):

deepxzy avatar Mar 21 '23 09:03 deepxzy

@deepxzy hi!感谢你的回答,我尝试你的方案,但是不work!

songhat avatar Mar 21 '23 14:03 songhat

我有类似的训练时内存不断增加的问题,调试之后发现是eval阶段内存占用会不断增大

fatejzz avatar Aug 10 '23 07:08 fatejzz

I train on nvidia pytorch docker and also have this problem. Try not to use the pin_memory resolve this problem. on train.py test_dataloader = data_.DataLoader(testset, batch_size=1, num_workers=opt.test_num_workers, shuffle=False, pin_memory=False )

hungphandinh92it avatar May 12 '24 03:05 hungphandinh92it