CheXNet icon indicating copy to clipboard operation
CheXNet copied to clipboard

RuntimeError: CUDA Error: out of memory

Open pharouknucleus opened this issue 6 years ago • 11 comments

Please help me resolve this issue

pharouknucleus avatar Sep 21 '18 07:09 pharouknucleus

try smaller batch size

saadmanrafat avatar Oct 06 '18 22:10 saadmanrafat

I have 64 batches. and the input size is 256 and the output size 242. By how much I am going to reduce it?

pharouknucleus avatar Oct 07 '18 11:10 pharouknucleus

try batch size 8, 16, 32. See if it works

saadmanrafat avatar Oct 07 '18 22:10 saadmanrafat

It is still showing me this error: Traceback (most recent call last): File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 149, in main() File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 95, in main output = model(input_var) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\parallel\data_parallel.py", line 121, in forward return self.module(*inputs[0], **kwargs[0]) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\Nasir Isa\Documents\1Research\algortihm\CheXNet-master\CheXNet-master\m3.py", line 144, in forward x = self.densenet121(x) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\models\densenet.py", line 220, in forward features = self.features(x) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\container.py", line 91, in forward input = module(input) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\Nasir Isa\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: out of memory

pharouknucleus avatar Oct 08 '18 11:10 pharouknucleus

@omrfrkmfy Were you ever able to figure out a solution to the problem? I'm dealing with the same issue

robhyb19 avatar May 19 '19 06:05 robhyb19

The issue is that your graphics card memory is small. you need to find one with with big memory.

On Sun, May 19, 2019, 07:54 robhyb19 [email protected] wrote:

@omrfrkmfy https://github.com/omrfrkmfy Were you ever able to figure out a solution to the problem? I'm dealing with the same issue

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arnoweng/CheXNet/issues/27?email_source=notifications&email_token=AJXR52JB3GFSXCLLNZK4UL3PWD2RVA5CNFSM4FWOLZPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVW4CMQ#issuecomment-493732146, or mute the thread https://github.com/notifications/unsubscribe-auth/AJXR52KN7W72U3GLRND5FK3PWD2RVANCNFSM4FWOLZPA .

pharouknucleus avatar May 20 '19 12:05 pharouknucleus

With 4 worker cores of NVIDIA P100, I had to gave 12 batch size. But, the AUROC is 49%, may be due to small batch size

Viswanath660 avatar May 28 '19 17:05 Viswanath660

Maybe you can try this idea https://blog.csdn.net/xijuezhu8128/article/details/86594478

cherrymj avatar Aug 26 '19 06:08 cherrymj

I have encountered the same issue and solved it by forcing no gradient when using model.eval() with torch.no_grad(): for i, (data,label) in enumerate(test_loader): ... (Remember to use tab) This makes the model do not save intermediate results so that temporary memory use will be freed after each batch. May it helps.

LiJiaqi96 avatar Mar 22 '20 03:03 LiJiaqi96

With 4 worker cores of NVIDIA P100, I had to gave 12 batch size. But, the AUROC is 49%, may be due to small batch size

I am dealing with the same issue and when I try multiple times it achieves different results. Did you solve it or find why?

Candyeeee avatar Apr 14 '20 01:04 Candyeeee

Hello, I know this is so late and it seems like the owner does not continue maintain the code for years. Yet if some of you end up with this problem and somehow, run into this issue. Try my solution: https://github.com/arnoweng/CheXNet/pull/39. I just started learning pytorch today and I'm not a pytorch pro. It is possible that my changes would lead to logic flaws. If that the case please tell me :)

icekang avatar Nov 15 '20 13:11 icekang