san-torch icon indicating copy to clipboard operation
san-torch copied to clipboard

not enough memory: you tried to allocate 15GB. Buy new RAM! ----is this about CPU RAM or GPU MEM?

Open SeekPoint opened this issue 9 years ago • 1 comments

I got 32g cpu ram and 2 gpu (gtx1080 8G) on my machine. why it cannot afford 15G memory?

rzai@rzai00:~/prj/san-torch/prepro$ CUDA_VISIBLE_DEVICES=1 th prepro_img_vgg.lua -input_json ../data/vqa_data_prepro.json -image_root /home/rzai/mscoco.org-visualqa.org/ -cnn_proto /home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt -cnn_model /home/rzai/VGG_ILSVRC_19_layers.caffemodel { batch_size : 20 gpuid : 6 out_name_train : "../data/vqa_data_img_vgg_train.h5" out_name_test : "../data/vqa_data_img_vgg_test.h5" cnn_proto : "/home/rzai/VGG_ILSVRC_19_layers_deploy.prototxt" cnn_model : "/home/rzai/VGG_ILSVRC_19_layers.caffemodel" backend : "cudnn" image_root : "/home/rzai/mscoco.org-visualqa.org/" input_json : "../data/vqa_data_prepro.json" } [libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded /home/rzai/VGG_ILSVRC_19_layers.caffemodel conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> (41) -> (42) -> (43) -> (44) -> (45) -> (46) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) (38): nn.View(-1) (39): nn.Linear(25088 -> 4096) (40): cudnn.ReLU (41): nn.Dropout(0.500000) (42): nn.Linear(4096 -> 4096) (43): cudnn.ReLU (44): nn.Dropout(0.500000) (45): nn.Linear(4096 -> 1000) (46): cudnn.SoftMax } nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) } processing 82460 images... /home/rzai/torch/install/bin/luajit: $ Torch: not enough memory: you tried to allocate 15GB. Buy new RAM! at /home/rzai/torch/pkg/torch/lib/TH/THGeneral.c:270 stack traceback: [C]: at 0x7f7b186bbe80 [C]: in function 'FloatTensor' prepro_img_vgg.lua:120: in main chunk [C]: in function 'dofile' ...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 rzai@rzai00:~/prj/san-torch/prepro$

SeekPoint avatar Nov 25 '16 05:11 SeekPoint

I am running it on 16 gb cpu ram and 6 gb gpu ram of gtx 1060. How to reduce the RAM size used?

table: 0x40f628c0 [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 Successfully loaded ../image_model/VGG_ILSVRC_19_layers.caffemodel Found Environment variable CUDNN_PATH = /home/ohil17yo36/san-torch-master/cuda/lib64/libcudnn.so.5 conv1_1: 64 3 3 3 conv1_2: 64 64 3 3 conv2_1: 128 64 3 3 conv2_2: 128 128 3 3 conv3_1: 256 128 3 3 conv3_2: 256 256 3 3 conv3_3: 256 256 3 3 conv3_4: 256 256 3 3 conv4_1: 512 256 3 3 conv4_2: 512 512 3 3 conv4_3: 512 512 3 3 conv4_4: 512 512 3 3 conv5_1: 512 512 3 3 conv5_2: 512 512 3 3 conv5_3: 512 512 3 3 conv5_4: 512 512 3 3 fc6: 1 1 25088 4096 fc7: 1 1 4096 4096 fc8: 1 1 4096 1000 nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> (38) -> (39) -> (40) -> (41) -> (42) -> (43) -> (44) -> (45) -> (46) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) (38): nn.View(-1) (39): nn.Linear(25088 -> 4096) (40): cudnn.ReLU (41): nn.Dropout(0.500000) (42): nn.Linear(4096 -> 4096) (43): cudnn.ReLU (44): nn.Dropout(0.500000) (45): nn.Linear(4096 -> 1000) (46): cudnn.SoftMax } nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (1): cudnn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1) (2): cudnn.ReLU (3): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) (4): cudnn.ReLU (5): cudnn.SpatialMaxPooling(2x2, 2,2) (6): cudnn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1) (7): cudnn.ReLU (8): cudnn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1) (9): cudnn.ReLU (10): cudnn.SpatialMaxPooling(2x2, 2,2) (11): cudnn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1) (12): cudnn.ReLU (13): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (14): cudnn.ReLU (15): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (16): cudnn.ReLU (17): cudnn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1) (18): cudnn.ReLU (19): cudnn.SpatialMaxPooling(2x2, 2,2) (20): cudnn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1) (21): cudnn.ReLU (22): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (23): cudnn.ReLU (24): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (25): cudnn.ReLU (26): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (27): cudnn.ReLU (28): cudnn.SpatialMaxPooling(2x2, 2,2) (29): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (30): cudnn.ReLU (31): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (32): cudnn.ReLU (33): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (34): cudnn.ReLU (35): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) (36): cudnn.ReLU (37): cudnn.SpatialMaxPooling(2x2, 2,2) } /home/ohil17yo36/DeepMind-Atari-Deep-Q-Learner-master/torch/bin/luajit: $ Torch: not enough memory: you tried to allocate 30GB. Buy new RAM! at /tmp/luarocks_torch-scm-1-642/torch7/lib/TH/THGeneral.c:270 stack traceback: [C]: at 0x7f5f167feb20 [C]: in function 'FloatTensor' prepro_img_vgg.lua:102: in main chunk [C]: at 0x004057a0

ohil17yo36 avatar Mar 14 '17 13:03 ohil17yo36