caffe-yolov3
caffe-yolov3 copied to clipboard
run detectnet failed,error "no kernel image is available for execution on the device" at forward_yolo_layer_gpu
I0809 17:26:14.176947 22577 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params. num_inputs is 1 num_outputs is 3 I0809 17:26:14.237879 22577 detectnet.cpp:78] Input data layer channels is 3 I0809 17:26:14.237900 22577 detectnet.cpp:79] Input data layer width is 416 I0809 17:26:14.237920 22577 detectnet.cpp:80] Input data layer height is 416 output blob1 shape c= 255, h = 13, w = 13 output blob2 shape c= 255, h = 26, w = 26 output blob3 shape c= 255, h = 52, w = 52 blobs.size()=3 0-step1 0-step2 0-step3 forward_yolo_layer_gpu 1 43095 a8240000 63800000 a826a15c 6382a15c forward_yolo_layer_gpu 2 CUDA Error: no kernel image is available for execution on the device CUDA Error: no kernel image is available for execution on the device: Resource temporarily unavailable
which version of caffe is you used?Can you share the right version of caffe's link
This may be your cuda problem,please check it.
this error occurred at first line do 'copy_gpu(l.batchl.inputs,(float)input,1,l.output_gpu,1);' in function "forward_yolo_layer_gpu" in file yolo_layer.cpp .I add some print in this function than found it.
86 void forward_yolo_layer_gpu(const float* input,layer l) 87 { 88 printf("before 11111\n"); 89 copy_gpu(l.batchl.inputs,(float)input,1,l.output_gpu,1); 90 printf("after 11111\n"); 91 int b,n; 92 for(b = 0;b < l.batch;++b){ 93 for(n =0;n< l.n;++n){ 94 int index = entry_index(l,b,nl.wl.h,0); 95 activate_array_gpu(l.output_gpu + index, 2l.wl.h,LOGISTIC); 96 index = entry_index(l,b,nl.wl.h,4); 97 activate_array_gpu(l.output_gpu + index,(1 + l.classes)l.wl.h,LOGISTIC); 98 } 99 } 100 cuda_pull_array(l.output_gpu,l.output,l.batch*l.outputs); 101 }
after run detectnet ,found the error as follows:
I0824 18:42:00.235325 16998 detectnet.cpp:76] Input data layer channels is 3 I0824 18:42:00.235347 16998 detectnet.cpp:77] Input data layer width is 416 I0824 18:42:00.235353 16998 detectnet.cpp:78] Input data layer height is 416 output blob1 shape c= 255, h = 13, w = 13 output blob2 shape c= 255, h = 26, w = 26 output blob3 shape c= 255, h = 52, w = 52 before 11111 CUDA Error: no kernel image is available for execution on the device detectnet: /home/LiuQiang/ext_work/caffe-yolov3/cuda.cpp:30: void check_error(cudaError_t): Assertion `0' failed. 已放弃 (核心已转储)
this code clone from your github caffe-yolov3 project,no changed or fixed
May be something wrong with your pc cuda compute_sm? Please check your pc cuda compute_sm,for example this below in my CMakeList.txt. ` # setup CUDA find_package(CUDA)
set( CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; -O3 -gencode arch=compute_53,code=sm_53 #tegra tx1 -gencode arch=compute_61,code=sm_61 #gtx 1060 -gencode arch=compute_62,code=sm_62 #tegra tx2 )`
Yes,i check it ,then run ok.Thanks.But The speed is slowly than org
Yes ,caffe is slower than darkent.I suggest that you can speed it with tensorrt.
Hi @anhuipl2010 , How much speed did you get for this? I mean FPS.
Hi @anhuipl2010, How much FPS did you get for this?