Deformable-ConvNets icon indicating copy to clipboard operation
Deformable-ConvNets copied to clipboard

src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Open LiangSiyuan21 opened this issue 7 years ago • 6 comments

hi,dear authors.When I trained FPN in my computer,I encountered a problem.I trained the dateset which is VOC2007 and the RFCN trained it successfully,but FPN can't.Could you tell me why is this? I will be grateful to you,Thanks!

This is my error in terminal!

Epoch[0] Batch [2200] Speed: 0.44 samples/sec Train-RPNAcc=0.983708, RPNLogLoss=0.045462, RPNL1Loss=0.026632, Proposal FG Fraction=0.054380, R-CNN FG Accuracy=0.000555, RCNNAcc=0.942830, RCNNLogLoss=0.346585, RCNNL1Loss=0.130626, Epoch[0] Batch [2300] Speed: 0.44 samples/sec Train-RPNAcc=0.983983, RPNLogLoss=0.044848, RPNL1Loss=0.026666, Proposal FG Fraction=0.055165, R-CNN FG Accuracy=0.000862, RCNNAcc=0.942161, RCNNLogLoss=0.347068, RCNNL1Loss=0.132162, Epoch[0] Batch [2400] Speed: 0.43 samples/sec Train-RPNAcc=0.984242, RPNLogLoss=0.044169, RPNL1Loss=0.026614, Proposal FG Fraction=0.055419, R-CNN FG Accuracy=0.000851, RCNNAcc=0.942020, RCNNLogLoss=0.346936, RCNNL1Loss=0.132555, Epoch[0] Batch [2500] Speed: 0.43 samples/sec Train-RPNAcc=0.984428, RPNLogLoss=0.043493, RPNL1Loss=0.026757, Proposal FG Fraction=0.055301, R-CNN FG Accuracy=0.001144, RCNNAcc=0.942257, RCNNLogLoss=0.343534, RCNNL1Loss=0.132003, [10:23:19] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:23:19] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Stack trace returned 10 entries: [bt] (0) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x26a3cc) [0x7f4a20c993cc] [bt] (1) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x254f5e8) [0x7f4a22f7e5e8] [bt] (2) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x25529d1) [0x7f4a22f819d1] [bt] (3) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x50765a) [0x7f4a20f3665a] [bt] (4) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2078988) [0x7f4a22aa7988] [bt] (5) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2078e68) [0x7f4a22aa7e68] [bt] (6) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ff888d) [0x7f4a22a2788d] [bt] (7) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ffc9e3) [0x7f4a22a2b9e3] [bt] (8) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ffcbe6) [0x7f4a22a2bbe6] [bt] (9) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ff9e2b) [0x7f4a22a28e2b]

[10:23:19] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:23:19] src/engine/./threaded_engine.h:370: [10:23:19] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

LiangSiyuan21 avatar Jan 18 '18 03:01 LiangSiyuan21

If you use FPN, make sure your GPUs have at least 12G memory.

YuwenXiong avatar Jan 18 '18 04:01 YuwenXiong

Hello, this is my gpu information.I use it only one ,but it did have at least 12G memory.Can I solve the problem by using more GPUs or change parameters in configs?

Thu Jan 18 12:35:19 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.90 Driver Version: 384.90 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:06:00.0 Off | 0 | | N/A 61C P0 149W / 149W | 9578MiB / 11439MiB | 98% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 00000000:07:00.0 Off | 0 | | N/A 28C P8 31W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 00000000:84:00.0 Off | 0 | | N/A 32C P8 26W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 00000000:85:00.0 Off | 0 | | N/A 29C P8 30W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 86645 C python 9565MiB |

LiangSiyuan21 avatar Jan 18 '18 04:01 LiangSiyuan21

@LiangSiyuan21 , you can adjust the params of train-scales size in the xxx.yaml , making it smaller!

larsoncs avatar Jan 19 '18 03:01 larsoncs

@larsoncs I hava the same question. Could you please tell me the detail for how to adjust the params in xxx.yaml

maruitao avatar May 10 '18 02:05 maruitao

I have the same problem. Could someone tell me how to deal with it? My GPUs is GTX 1080 x2

songzenghui avatar Jul 16 '18 06:07 songzenghui

Please purchase Titan V

engineer1109 avatar Jul 24 '18 10:07 engineer1109