FewShotLearning icon indicating copy to clipboard operation
FewShotLearning copied to clipboard

CUDA out of memory

Open zihaozhang9 opened this issue 5 years ago • 3 comments

Traceback (most recent call last): File "main.py", line 45, in <module> results = importlib.import_module(opt['metaLearner']).run(opt,data) File "/home/user/myproject/FewShotLearning/model/lstm/train-lstm.py", line 123, in run opt['batchSize'][opt['nTrainShot']]) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/user/myproject/FewShotLearning/model/lstm/metaLearner.py", line 149, in forward output, loss = learner(testInput, testTarget) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/user/myproject/FewShotLearning/model/lstm/learner.py", line 51, in forward output = self.modelF.net(inputs) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/user/myproject/FewShotLearning/model/lstm-classifier.py", line 79, in forward x = self.layer2(x) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward exponential_average_factor, self.eps) File "/home/user/anaconda3/envs/FewShotLearning/lib/python2.7/site-packages/torch/nn/functional.py", line 1623, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 16.25 MiB (GPU 0; 11.91 GiB total capacity; 8.65 GiB already allocated; 17.06 MiB free; 950.50 MiB cached) ` nvidia-smi Wed Apr 3 20:53:46 2019
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.26 Driver Version: 396.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A | | 23% 35C P8 16W / 250W | 1035MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A | | 23% 36C P8 18W / 250W | 10MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) Off | 00000000:82:00.0 Off | N/A | | 23% 30C P8 17W / 250W | 10MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN X (Pascal) Off | 00000000:83:00.0 Off | N/A | | 23% 33C P8 17W / 250W | 10MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 10217 C ...qun/anaconda2/envs/pt04_py27/bin/python 1021MiB | +-----------------------------------------------------------------------------+ `

zihaozhang9 avatar Apr 03 '19 04:04 zihaozhang9

I also encountered this problem. Have you solved it?

lwzhaojun avatar Dec 17 '19 01:12 lwzhaojun

我也遇到了这个问题。解决了吗

I forgot it was too early. Also you should reduce the batch_size.

zihaozhang9 avatar Dec 17 '19 01:12 zihaozhang9

Thank you. Let me try it.

lwzhaojun avatar Dec 17 '19 01:12 lwzhaojun