PoseCNN
PoseCNN copied to clipboard
Segmentation Fault
Hi,
While running ./experiments/scripts/demo.sh $GPU_ID
, I almost always get these errors and warnings :
2018-07-06 13:14:14.755317: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-07-06 13:14:14.755381: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: thomas-HP-Pavilion-15-Notebook-PC
2018-07-06 13:14:14.755404: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: thomas-HP-Pavilion-15-Notebook-PC
2018-07-06 13:14:14.755490: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.130.0
2018-07-06 13:14:14.755530: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.130 Wed Mar 21 03:37:26 PDT 2018 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) """
2018-07-06 13:14:14.755558: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.130.0
2018-07-06 13:14:14.755572: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.130.0
Loading model weights from data/demo_models/vgg16_fcn_color_single_frame_2d_pose_add_lov_iter_160000.ckpt
data/demo_images/000002-color.png
./experiments/scripts/demo.sh: line 18: 16203 Segmentation fault (core dumped) ./tools/demo.py --gpu 0 --network vgg16_convs --model data/demo_models/vgg16_fcn_color_single_frame_2d_pose_add_lov_iter_160000.ckpt --imdb lov_keyframe --cfg experiments/cfgs/lov_color_2d.yml --rig data/LOV/camera.json --cad data/LOV/models.txt --pose data/LOV/poses.txt --background data/cache/backgrounds.pk
` The segmentation fault error doesn't happen every time but still often happen. Has anyone encountered this ? Is the memory allocation warning linked to the segmentation fault ? Do I have not enough RAM on my computer ?
My current setting is : Ubuntu 16.04 with Memory : 5.8 Cuda 8.0 with cuDNN 7 Tensorflow 1.8 GPU : NVIDIA GEFORCE 840M
I run into the same issue. I have 32GB of RAM on my machine and am not running any other training at this time.
If that can helps : I resolved this issue by precising the gpu id in the command line ./experiments/scripts/demo.sh 0
and by decreasing the number of thread per block from 1024 to 512 before compiling the layer (see variable kThreadsPerBlock).
I am trying to embed a list of sentence using ELMo pretrained model (so no training is happening) and I am getting the same kind of error:
2018-09-04 15:35:40.470118: W tensorflow/core/framework/allocator.cc:108] Allocation of 58989110400 exceeds 10% of system memory.
Segmentation fault (core dumped)
@TotoLulu94 do you have any idea? Thanks
What works for me was to decrease the number kThreadsPerBlock from 1024 to 512 in the different layers. I don't have any ideas else
Hi, @TotoLulu94 ,How to decrease the number kThreadsPerBlock?
I am not sure how it is working for some people by changing kThreadsPerBlock. But actually, hough_voting_gpu_op.cc has a problem. In-Line 756-757 segmentation fault occurs because dx.size() and dy.size() can be zero some time. I added a simple if statement to check the condition it works fine now.