noscope
noscope copied to clipboard
GTX1070:CUDA Error: out of memory
GTX1070:7.9G memory when i run_optimizerset.sh,the train_9180_18360.log displayed errors.
train_9180_18360.log: 2017-08-28 17:31:50.229531: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-08-28 17:31:50.351945: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-08-28 17:31:50.352224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.8225 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.31GiB 2017-08-28 17:31:50.352235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2017-08-28 17:31:50.352240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y 2017-08-28 17:31:50.352249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0) layer filters size input output 0 CUDA Error: out of memory: File exists CUDA Error: out of memory
I'm getting the same error even with 1000 frames video
The system is optimized for a P100 GPU with 16GB of memory. This diff is confirmed to work on a K80, you may need to change 0.8 to much less:
diff --git a/tensorflow/noscope/noscope.cc b/tensorflow/noscope/noscope.cc
index 4cd6a14..98b80e2 100644
--- a/tensorflow/noscope/noscope.cc
+++ b/tensorflow/noscope/noscope.cc
@@ -60,7 +60,7 @@ static tensorflow::Session* InitSession(const std::string& gra ph_fname) {
tensorflow::SessionOptions opts;
tensorflow::GraphDef graph_def;
// YOLO needs some memory
- opts.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.9);
+ opts.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.8);
// opts.config.mutable_gpu_options()->set_allow_growth(true);
tensorflow::Status status = NewSession(opts, &session);
TF_CHECK_OK(status);
I'd be happy to merge a pull request that automatically detects the amount of memory necessary for YOLOv2 as a fraction of the available GPU memory.
0.8
works fine for GTX 1070 and there is no memory error, but now I'm getting Segmentation fault (core dumped)
. What can it be?
Update: The same issue for yolo9000 and tiny-yolo
Please paste the full output log from the run
(noscope) arsey@ml-machine:~/noscope/data/experiments/jackson-town-square/train/jackson-town-square_convnet_128_32_2.pb-non_blocked_mse.src$ ./run_optimizerset.sh 1
2017-08-29 20:17:53.261228: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-29 20:17:53.390616: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-29 20:17:53.391159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7465
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.83GiB
2017-08-29 20:17:53.391171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-08-29 20:17:53.391175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-08-29 20:17:53.391180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:05:00.0)
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32
1 max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
25 route 16
26 conv 64 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 64
27 reorg / 2 38 x 38 x 64 -> 19 x 19 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 19 x 19 x1280 -> 19 x 19 x1024
30 conv 425 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 425
31 detection
Loading weights from /home/arsey/projects/darknet/yolo.weights...Done!
Dumping video
./run_optimizerset.sh: line 36: 12270 Segmentation fault (core dumped) /home/arsey/noscope/tensorflow-noscope/bazel-bin/tensorflow/noscope/noscope --diff_thresh=0 --distill_thresh_lower=0 --distill_thresh_u
pper=0 --skip_small_cnn=0 --skip_diff_detection=0 --skip=30 --avg_fname=/home/arsey/noscope/data/cnn-avg/jackson-town-square.txt --graph=/home/arsey/n
oscope/data/cnn-models/jackson-town-square_convnet_128_32_2.pb --video=/home/arsey/noscope/data/videos/jackson-town-square.mp4 --yolo_cfg=/home/arsey/projects/darknet/cfg/yolo.cfg --yolo_weights=/home/arsey/proj
ects/darknet/yolo.weights --yolo_class=2 --confidence_csv=/home/arsey/noscope/data/experiments/jackson-town-square/train/jackson-town-square_convnet_128_32_2.pb-non_blocked_mse.src/train_${START_FRAME}_${END_FRA
ME}.csv --start_from=${START_FRAME} --nb_frames=$LEN --dumped_videos=/home/arsey/noscope/data/video-cache/jackson-town-square_0_250_1.bin --diff_detection_weights=/dev/null --use_blocked=0 --ref_image=0
real 0m2.665s
user 0m2.176s
sys 0m0.620s
Any thoughts?
thanks,0.8 is useful for 1070,it works.But,it also have a problem that is no memory error! my computer memory is 8GB.When i run the motherdog.py, if i choose high frames or low target_fp,the problem will appear.I want to use 918000 frames and low target_fp to run motherdog,how to change the code?
Segmentation fault issue was related to the wrong number of frames set for training (250) in noscope_motherdog.py
, and a video had 30 frames per second. So the error was appearing in noscope_data.cc
inside of a for loop:
for (size_t i = 0; i < kNbFrames; i++) {
cap >> frame;
if (i % kSkip_ == 0) {
std::cout << "frame: " << i << "\n";
const size_t ind = i / kSkip_;
cv::resize(frame, yolo_frame, NoscopeData::kYOLOResol_, 0, 0, cv::INTER_NEAREST);
cv::resize(frame, diff_frame, NoscopeData::kDiffResol_, 0, 0, cv::INTER_NEAREST);
cv::resize(frame, dist_frame, NoscopeData::kDistResol_, 0, 0, cv::INTER_NEAREST);
dist_frame.convertTo(dist_frame_f, CV_32FC3);
if (!yolo_frame.isContinuous()) {
throw std::runtime_error("yolo frame is not continuous");
}
if (!diff_frame.isContinuous()) {
throw std::runtime_error("diff frame is not continuous");
}
if (!dist_frame.isContinuous()) {
throw std::runtime_error("dist frame is not conintuous");
}
if (!dist_frame_f.isContinuous()) {
throw std::runtime_error("dist frame f is not continuous");
}
memcpy(&yolo_data_[ind * kYOLOFrameSize_], yolo_frame.data, kYOLOFrameSize_);
memcpy(&diff_data_[ind * kDiffFrameSize_], diff_frame.data, kDiffFrameSize_);
memcpy(&dist_data_[ind * kDistFrameSize_], dist_frame_f.data, kDistFrameSize_ * sizeof(float));
}
}
Unfortunately, the codebase currently assumes videos are 30 FPS.
I have 8GB memory ,so I use 270000 frames and run run_optimizerset.sh separately in four steps.At last,it works.