Deformable-ConvNets
Deformable-ConvNets copied to clipboard
src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory
hi,dear authors.When I trained FPN in my computer,I encountered a problem.I trained the dateset which is VOC2007 and the RFCN trained it successfully,but FPN can't.Could you tell me why is this? I will be grateful to you,Thanks!
This is my error in terminal!
Epoch[0] Batch [2200] Speed: 0.44 samples/sec Train-RPNAcc=0.983708, RPNLogLoss=0.045462, RPNL1Loss=0.026632, Proposal FG Fraction=0.054380, R-CNN FG Accuracy=0.000555, RCNNAcc=0.942830, RCNNLogLoss=0.346585, RCNNL1Loss=0.130626, Epoch[0] Batch [2300] Speed: 0.44 samples/sec Train-RPNAcc=0.983983, RPNLogLoss=0.044848, RPNL1Loss=0.026666, Proposal FG Fraction=0.055165, R-CNN FG Accuracy=0.000862, RCNNAcc=0.942161, RCNNLogLoss=0.347068, RCNNL1Loss=0.132162, Epoch[0] Batch [2400] Speed: 0.43 samples/sec Train-RPNAcc=0.984242, RPNLogLoss=0.044169, RPNL1Loss=0.026614, Proposal FG Fraction=0.055419, R-CNN FG Accuracy=0.000851, RCNNAcc=0.942020, RCNNLogLoss=0.346936, RCNNL1Loss=0.132555, Epoch[0] Batch [2500] Speed: 0.43 samples/sec Train-RPNAcc=0.984428, RPNLogLoss=0.043493, RPNL1Loss=0.026757, Proposal FG Fraction=0.055301, R-CNN FG Accuracy=0.001144, RCNNAcc=0.942257, RCNNLogLoss=0.343534, RCNNL1Loss=0.132003, [10:23:19] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:23:19] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory
Stack trace returned 10 entries: [bt] (0) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x26a3cc) [0x7f4a20c993cc] [bt] (1) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x254f5e8) [0x7f4a22f7e5e8] [bt] (2) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x25529d1) [0x7f4a22f819d1] [bt] (3) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x50765a) [0x7f4a20f3665a] [bt] (4) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2078988) [0x7f4a22aa7988] [bt] (5) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2078e68) [0x7f4a22aa7e68] [bt] (6) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ff888d) [0x7f4a22a2788d] [bt] (7) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ffc9e3) [0x7f4a22a2b9e3] [bt] (8) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ffcbe6) [0x7f4a22a2bbe6] [bt] (9) /home/wuyonglin/virenv/MXNet2.7/local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x1ff9e2b) [0x7f4a22a28e2b]
[10:23:19] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:23:19] src/engine/./threaded_engine.h:370: [10:23:19] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory
If you use FPN, make sure your GPUs have at least 12G memory.
Hello, this is my gpu information.I use it only one ,but it did have at least 12G memory.Can I solve the problem by using more GPUs or change parameters in configs?
Thu Jan 18 12:35:19 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.90 Driver Version: 384.90 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:06:00.0 Off | 0 | | N/A 61C P0 149W / 149W | 9578MiB / 11439MiB | 98% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 00000000:07:00.0 Off | 0 | | N/A 28C P8 31W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 00000000:84:00.0 Off | 0 | | N/A 32C P8 26W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 00000000:85:00.0 Off | 0 | | N/A 29C P8 30W / 149W | 11MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 86645 C python 9565MiB |
@LiangSiyuan21 , you can adjust the params of train-scales size in the xxx.yaml , making it smaller!
@larsoncs I hava the same question. Could you please tell me the detail for how to adjust the params in xxx.yaml
I have the same problem. Could someone tell me how to deal with it? My GPUs is GTX 1080
x2
Please purchase Titan V