PoseCNN cudaCheckError() failed compute outputs: too many resources requested for launch

I get this error while trying to run ./experiments/scripts/demo.sh 0 :

2018-07-06 12:51:54.564555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-06 12:51:54.565020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce 840M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 1.96GiB freeMemory: 1.73GiB
2018-07-06 12:51:54.565060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce 840M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
Loading model weights from data/demo_models/vgg16_fcn_color_single_frame_2d_pose_add_lov_iter_160000.ckpt
data/demo_images/000003-color.png
2018-07-06 12:51:56.557339: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.76GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:56.736688: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:56.956865: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:57.085750: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 337.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:57.158900: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.64GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:57.328548: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1017.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:57.572087: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:57.769324: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 524.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2018-07-06 12:51:58.011584: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 711.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
cudaCheckError() failed compute outputs: too many resources requested for launch

I guess that this is due to the fact that my GPU card doesn't have enough memory. Is there a way to avoid this error without changing my GPU card ?

Jul 06 '18 11:07 TotoLulu94

Hi, @TotoLulu94 .I run into the same issue as following: Use network vgg16_convs in training 2018-09-14 10:28:07.777198: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-09-14 10:28:07.855678: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-09-14 10:28:07.855932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.3165 pciBusID: 0000:01:00.0 totalMemory: 3.94GiB freeMemory: 3.36GiB 2018-09-14 10:28:07.855950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2) Loading model weights from data/demo_models/vgg16_fcn_color_single_frame_2d_pose_add_lov_iter_160000.ckpt data/demo_images/000001-color.png 2018-09-14 10:28:08.811070: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.76GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-09-14 10:28:08.892910: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2018-09-14 10:28:08.932697: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.64GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. cudaCheckError() failed compute outputs: too many resources requested for launch Command exited with non-zero status 255 2.63user 1.50system 0:03.36elapsed 123%CPU (0avgtext+0avgdata 1966940maxresident)k 0inputs+0outputs (0major+441177minor)pagefaults 0swaps I have decreased the number kThreadsPerBlock from 1024 to 512 in the different layers as you mentioned, but it didn't work for me.Do you have any other idea? Thanks

Sep 14 '18 02:09 AlmostMiao

I'm facing the same issue and tried solving it the same way @briliantnugraha describes here in issue #65:

Hi @hamza-04, you could try 2 things (which is working for me):

1. Reduce batch size:


* Open PoseCNN\experiments\cfgs\lov_color_2d.yml

* Change the "IMS_PER_BATCH" option from 2 to 1


1. Reduce KThreadPerBlocks from 1024 to 512 as suggested by @Kaju-Bubanja in all files (use vscode or other editors for faster edit)

Unfortunately, just like @TotoLulu94 and @AlmostMiao, I was not able to solve the issue this way. Is there another way to debug this issue? What about minimal requirements for the GPU? Thanks

Sep 24 '19 10:09 nico-buehler

PoseCNN PoseCNN copied to clipboard

cudaCheckError() failed compute outputs: too many resources requested for launch

PoseCNN
PoseCNN copied to clipboard