TensorRT-Yolov3 icon indicating copy to clipboard operation
TensorRT-Yolov3 copied to clipboard

[Training] Can you give me the github for model train on caffe

Open uname0x96 opened this issue 6 years ago • 34 comments

Hi @lewes6369 , I'm very happy because your git help me so much on my work. But can you give me what link you was use for trainning. I'm a noob in caffe so i was try convert from keras or tensorflow to caffe but it so hard and i meet so many bug with that. Can you help me, please.

uname0x96 avatar Mar 28 '19 11:03 uname0x96

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

lewes6369 avatar Mar 31 '19 10:03 lewes6369

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

I got it. Thanks for your work :D

uname0x96 avatar Mar 31 '19 11:03 uname0x96

Hi, @lewes6369 , I can resolve my problem but i have 1 question because i dont know about that. When i use pretrain yolov3 from darknet. I just train 1 class is my cane. So i config it to train with 1 class, and set filters=(classes + 5)*3 = (1+5) * 3. Then when i converted it to caffe model. i use this follow cmd for detect my cane : ./install/runYolov3 --caffemodel=./model/yolov3_cane.caffemodel --prototxt=./model/yolov3_cane_trt.prototxt --W=416 --H=416 --class=1 --mode=fp16 --input=./doge.jpg It will be got core cumped. But if i change --class=80 it will be ok, and class 0 change from human like your yolo3_model to cane with my model. And also i just train 1 class but it can detect too many other object like coco model. What does it mean ? And can i fix output model just detect only 1 class is my cane, because i think it can increase my performance. Thanks for reading!

uname0x96 avatar Apr 04 '19 04:04 uname0x96

Hi,@cong235 . Do you modify the CLASS_NUM in file tensorRTWrapper/code/include/YoloConfigs.h to class one? Not only the cmd but also this header need to change the class num. I will merge these two num setting in the cmd later.

lewes6369 avatar Apr 07 '19 14:04 lewes6369

Hi,@cong235 . Do you modify the CLASS_NUM in file tensorRTWrapper/code/include/YoloConfigs.h to class one? Not only the cmd but also this header need to change the class num. I will merge these two num setting in the cmd later.

Thanks for your reply. I got it, i forgot it :D p/s: Hmm, can i write custom layer without coding for cuda ?

uname0x96 avatar Apr 07 '19 15:04 uname0x96

Yes. If you did not coding for cuda, you have to run the custom layer on cpu and it will cost time over the communication between cpu memory and gpu device. The last low computing layer doing in cpu is just ok.

lewes6369 avatar Apr 07 '19 15:04 lewes6369

Yes. If you did not coding for cuda, you have to run the custom layer on cpu and it will cost time over the communication between cpu memory and gpu device. The last low computing layer doing in cpu is just ok.

It mean after convert caffe to engine TRT, that layer will just run with CPU instead GPU right ?

uname0x96 avatar Apr 08 '19 10:04 uname0x96

Convert to engine TRT ,all support layers will run on GPU. The customer layer can be run both cpu and gpu by your implementation. If you want to run on gpu, just code as cuda . See codes in YoloLayer.cu,It contains both cpu and gpu implementation.

lewes6369 avatar Apr 27 '19 16:04 lewes6369

Convert to engine TRT ,all support layers will run on GPU. The customer layer can be run both cpu and gpu by your implementation. If you want to run on gpu, just code as cuda . See codes in YoloLayer.cu,It contains both cpu and gpu implementation.

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

uname0x96 avatar May 04 '19 02:05 uname0x96

Hi @lewes6369 , i'm working with Jetson TX2 for real-time detection, with batchsize = 1 and yolov3 it took 140ms per image ~ 7FPS. And now i use batchsize = 4 and it took 530ms for 1 batch ~ 7FPS too. Can you give some suggestion for increase it ? my goal is 15 FPS whith batchsize = 4. I'm trying to decrease yolo kernel and CHECK_COUNT and number of anchor box but still cant increase so much. Do you have any suggest ? Thanks for reading :dancer:

uname0x96 avatar May 04 '19 03:05 uname0x96

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

Hi cong235,

I am also trying to run the tiny-yolo-3l.cfg . Have you already managed to change the TensorRT wrapper for tiny yolo? Can you please let me know the changes you did ?

THanks

aditbhrgv avatar May 06 '19 14:05 aditbhrgv

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

Hi cong235,

I am also trying to run the tiny-yolo-3l.cfg . Have you already managed to change the TensorRT wrapper for tiny yolo? Can you please let me know the changes you did ?

THanks

yes, i was runing with yolov3_tiny. You should change the tensorrt_wrap and yoloConfig. YoloConfig.h //YOLO 416 YoloKernel yolo1 = { 13, 13, {81,82, 135,169, 344,319} }; YoloKernel yolo2 = { 26, 26, {10,14, 23,27, 37,58} };

And comment line : //mYoloKernel.push_back(yolo3); in YoloLayer.cu now you can run it

uname0x96 avatar May 07 '19 09:05 uname0x96

@cong235 - Actually I am running tiny-yolo-3l.cfg model. Accoriding to this model, I changed the Yoloconfig.h file with appropriate anchors and CLASS_NUM = 2 (because I have 2 classes to detect). But, the inference code doesn't give me any detections. Have you faced similar problem ? Do we need to change something else ? Thanks for yor help in advance !

aditbhrgv avatar May 07 '19 11:05 aditbhrgv

@cong235 - Actually I am running tiny-yolo-3l.cfg model. Accoriding to this model, I changed the Yoloconfig.h file with appropriate anchors and CLASS_NUM = 2 (because I have 2 classes to detect). But, the inference code doesn't give me any detections. Have you faced similar problem ? Do we need to change something else ? Thanks for yor help in advance !

You need change your tiny.cfg too. comment the "upsample_param" blocks, and modify the prototxt the last layer as: layer { #the bottoms are the yolo input layers bottom: "layer16-conv" bottom: "layer23-conv" top: "yolo-det" name: "yolo-det" type: "Yolo" }

uname0x96 avatar May 07 '19 11:05 uname0x96

@cong235 I already did that ! Refer my .cfg file attached ! Still I can't get any detections. Maybe some hard-coded confidence thresholds? name: "Darkent2Caffe" input: "data" input_dim: 1 input_dim: 3 input_dim: 608 input_dim: 608

layer { bottom: "data" top: "layer1-conv" name: "layer1-conv" type: "Convolution" convolution_param { num_output: 16 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer1-conv" top: "layer2-maxpool" name: "layer2-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer2-maxpool" top: "layer3-conv" name: "layer3-conv" type: "Convolution" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer3-conv" top: "layer4-maxpool" name: "layer4-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer4-maxpool" top: "layer5-conv" name: "layer5-conv" type: "Convolution" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer5-conv" top: "layer6-maxpool" name: "layer6-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer6-maxpool" top: "layer7-conv" name: "layer7-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer7-conv" top: "layer8-maxpool" name: "layer8-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer8-maxpool" top: "layer9-conv" name: "layer9-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer9-conv" top: "layer10-maxpool" name: "layer10-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer10-maxpool" top: "layer11-conv" name: "layer11-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer11-conv" top: "layer12-maxpool" name: "layer12-maxpool" type: "Pooling" pooling_param { stride: 1 pool: MAX kernel_size: 3 pad: 1 } } layer { bottom: "layer12-maxpool" top: "layer13-conv" name: "layer13-conv" type: "Convolution" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer13-conv" top: "layer14-conv" name: "layer14-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer14-conv" top: "layer15-conv" name: "layer15-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer15-conv" top: "layer16-conv" name: "layer16-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer14-conv" top: "layer18-route" name: "layer18-route" type: "Concat" } layer { bottom: "layer18-route" top: "layer19-conv" name: "layer19-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer19-conv" top: "layer20-upsample" name: "layer20-upsample" type: "Upsample" #upsample_param { # scale: 2 #} } layer { bottom: "layer20-upsample" bottom: "layer9-conv" top: "layer21-route" name: "layer21-route" type: "Concat" } layer { bottom: "layer21-route" top: "layer22-conv" name: "layer22-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer22-conv" top: "layer23-conv" name: "layer23-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer22-conv" top: "layer25-route" name: "layer25-route" type: "Concat" } layer { bottom: "layer25-route" top: "layer26-conv" name: "layer26-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer26-conv" top: "layer27-upsample" name: "layer27-upsample" type: "Upsample" #upsample_param { # scale: 2 #} } layer { bottom: "layer27-upsample" bottom: "layer7-conv" top: "layer28-route" name: "layer28-route" type: "Concat" } layer { bottom: "layer28-route" top: "layer29-conv" name: "layer29-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer29-conv" top: "layer30-conv" name: "layer30-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } }

layer { bottom: "layer16-conv" bottom: "layer23-conv" bottom: "layer30-conv" top: "yolo-det" name: "yolo-det" type: "Yolo" }

aditbhrgv avatar May 07 '19 11:05 aditbhrgv

@aditbhrgv You just need do something if want use other yolo model:

  1. Edit file .cfg
  2. Change numclass
  3. Edit YoloKernel with filter size match with anchor box
  4. Check your model again.

uname0x96 avatar May 07 '19 11:05 uname0x96

@cong235 thank you ! It works now. However, my int8 mode is not working. I am using TensorRT 5.0.2.6 (is there some dependency on TensorRT version ?) I chose 10 images in calibration dataset which is a subset of my validation dataset.

####### input args####### C=3; H=608; W=608; batchsize=1; caffemodel=yolov3-3l.caffemodel; calib=calib_sample.txt; class=2; enginefile=; evallist=; input=000000.jpg; mode=int8; nms=0.450000; outputs=yolo-det; prototxt=yolov3-3l.prototxt; ####### end args####### find calibration file,loading ... init plugin proto: yolov3-3l.prototxt caffemodel: yolov3-3l.caffemodel create calibrator,Named:yolov3-3l Begin parsing model... End parsing model... setInt8Mode Begin building engine... End building engine... save Engine...yolov3_int8.engine process: 000000.jpg Time taken for inference is 9.37288 ms. Det count is 0 Time taken for nms is 0.001386 ms. layer1-conv input reformatter 0 0.038ms layer1-conv 0.130ms layer1-act 0.134ms layer2-maxpool 0.093ms layer3-conv input reformatter 0 0.037ms layer3-conv 0.071ms layer3-act 0.069ms layer4-maxpool 0.049ms layer5-conv input reformatter 0 0.020ms layer5-conv 0.046ms layer5-act 0.036ms layer6-maxpool 0.030ms layer7-conv input reformatter 0 0.011ms layer7-conv 0.038ms layer7-act 0.017ms layer8-maxpool 0.016ms layer9-conv input reformatter 0 0.006ms layer9-conv 0.046ms layer9-act 0.006ms layer10-maxpool 0.007ms layer11-conv input reformatter 0 0.004ms layer11-conv 0.070ms layer11-act 0.004ms layer12-maxpool 0.010ms layer13-conv input reformatter 0 0.005ms layer13-conv 0.137ms layer13-act input reformatter 0 0.011ms layer13-act 0.005ms layer14-conv input reformatter 0 0.008ms layer14-conv 0.038ms layer14-act 0.004ms layer14-act output reformatter 0 0.005ms layer15-conv 0.053ms layer15-act 0.004ms layer16-conv 0.016ms layer14-conv copy 0.006ms layer19-conv 0.012ms layer19-act 0.003ms layer20-upsample 0.010ms layer20-upsample copy 0.007ms layer9-conv copy 0.010ms layer22-conv 0.099ms layer22-act 0.006ms layer23-conv 0.015ms layer22-conv copy 0.009ms layer26-conv 0.023ms layer26-act 0.005ms layer27-upsample 0.029ms layer27-upsample copy 0.019ms layer7-conv copy 0.018ms layer29-conv 0.104ms layer29-act 0.017ms layer30-conv 0.031ms yolo-det 6.559ms Time over all layers: 8.263

aditbhrgv avatar May 07 '19 12:05 aditbhrgv

@aditbhrgv Hi: I'm also trying to use TensorRt to do real time object detection tasks with TX2 recently and I believe TX2 does not support int8 mode. It works for xavier and some GeForce cards.

zeyuDai2018 avatar May 20 '19 10:05 zeyuDai2018

@aditbhrgv not support is not running bro. "Not support" mean you can't run it with full rate at INT8. But in real-life some model can run very fast with INT8. Yolov3-tiny can run with 66FPS on TX2 with INT8

uname0x96 avatar May 21 '19 02:05 uname0x96

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

I got it. Thanks for your work :D

Hi, It seems that https://github.com/ChenYingpeng/caffe-yolov3/tree/master/model_convert doesn't exist, is there any other methods?

mxzhao avatar May 28 '19 01:05 mxzhao

@cong235 thank you ! It works now. However, my int8 mode is not working. I am using TensorRT 5.0.2.6 (is there some dependency on TensorRT version ?) I chose 10 images in calibration dataset which is a subset of my validation dataset.

####### input args####### C=3; H=608; W=608; batchsize=1; caffemodel=yolov3-3l.caffemodel; calib=calib_sample.txt; class=2; enginefile=; evallist=; input=000000.jpg; mode=int8; nms=0.450000; outputs=yolo-det; prototxt=yolov3-3l.prototxt; ####### end args####### find calibration file,loading ... init plugin proto: yolov3-3l.prototxt caffemodel: yolov3-3l.caffemodel create calibrator,Named:yolov3-3l Begin parsing model... End parsing model... setInt8Mode Begin building engine... End building engine... save Engine...yolov3_int8.engine process: 000000.jpg Time taken for inference is 9.37288 ms. Det count is 0 Time taken for nms is 0.001386 ms. layer1-conv input reformatter 0 0.038ms layer1-conv 0.130ms layer1-act 0.134ms layer2-maxpool 0.093ms layer3-conv input reformatter 0 0.037ms layer3-conv 0.071ms layer3-act 0.069ms layer4-maxpool 0.049ms layer5-conv input reformatter 0 0.020ms layer5-conv 0.046ms layer5-act 0.036ms layer6-maxpool 0.030ms layer7-conv input reformatter 0 0.011ms layer7-conv 0.038ms layer7-act 0.017ms layer8-maxpool 0.016ms layer9-conv input reformatter 0 0.006ms layer9-conv 0.046ms layer9-act 0.006ms layer10-maxpool 0.007ms layer11-conv input reformatter 0 0.004ms layer11-conv 0.070ms layer11-act 0.004ms layer12-maxpool 0.010ms layer13-conv input reformatter 0 0.005ms layer13-conv 0.137ms layer13-act input reformatter 0 0.011ms layer13-act 0.005ms layer14-conv input reformatter 0 0.008ms layer14-conv 0.038ms layer14-act 0.004ms layer14-act output reformatter 0 0.005ms layer15-conv 0.053ms layer15-act 0.004ms layer16-conv 0.016ms layer14-conv copy 0.006ms layer19-conv 0.012ms layer19-act 0.003ms layer20-upsample 0.010ms layer20-upsample copy 0.007ms layer9-conv copy 0.010ms layer22-conv 0.099ms layer22-act 0.006ms layer23-conv 0.015ms layer22-conv copy 0.009ms layer26-conv 0.023ms layer26-act 0.005ms layer27-upsample 0.029ms layer27-upsample copy 0.019ms layer7-conv copy 0.018ms layer29-conv 0.104ms layer29-act 0.017ms layer30-conv 0.031ms yolo-det 6.559ms Time over all layers: 8.263

Hi, why did your initial tiny model not work? I just converted the official yolov3-tiny.cfg/weights and made the change in YoloConfig.h like this: //YOLO 416

YoloKernel yolo1 = { 13, 13, {81,82, 135,169, 344,319} }; YoloKernel yolo2 = { 26, 26, {10,14, 23,27, 37,58} }; but it doesn't give any detections? Can you share some your experiences?

mxzhao avatar Jun 08 '19 04:06 mxzhao

@aditbhrgv You just need do something if want use other yolo model:

1. Edit file .cfg

2. Change numclass

3. Edit YoloKernel with filter size match with anchor box

4. Check your model again.

@cong235 Could you please explain what do you mean by step 3. filter size match with anchor box Where do I get the filter size from my yolov3.cfg file? Also I have 9 pairs of anchor points, how do I change my YoloKernel? My config: Num_classes = 9 filters: 42 (original yolov3 = 255) anchors = 11.3950,25.3481, 21.0826,48.1415, 29.8289,76.1553, 35.8586,132.5984, 66.4218,89.7861, 92.6243,139.2145, 164.5912,141.0014, 140.5117,216.2245, 238.7294,323.5685 Could you please show me an example?

prajwaljpj avatar Jul 26 '19 14:07 prajwaljpj

@prajwaljpj Hi, What is your kernel feature size? if 416 input, try this: YoloKernel yolo1 = { 13, 13, {164.5912,141.0014, 140.5117,216.2245, 238.7294,323.5685} }; YoloKernel yolo2 = { 26, 26, {35.8586,132.5984, 66.4218,89.7861, 92.6243,139.2145} }; YoloKernel yolo3 = { 52, 52, {11.3950,25.3481, 21.0826,48.1415, 29.8289,76.1553} }; and make sure the anchors are below to according kernel.

lewes6369 avatar Jul 27 '19 15:07 lewes6369

@lewes6369 Thanks for the quick reply. Yes the kernel feature size is 416 and It worked!

prajwaljpj avatar Jul 28 '19 14:07 prajwaljpj

@cong235 Hello, My yolov3-tiny always reports an error: ####### input args####### C=3; H=416; W=416; caffemodel=./yolov3-tiny.caffemodel; calib=; cam=0; class=80; classname=coco.name; display=1; evallist=; input=0; inputstream=cam; mode=fp32; nms=0.450000; outputs=yolo-det; prototxt=./yolov3-tiny-trt.prototxt; savefile=result; saveimg=0; videofile=sample.mp4; ####### end args####### init plugin proto: ./yolov3-tiny-trt.prototxt caffemodel: ./yolov3-tiny.caffemodel Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can you give me your yolov3-tiny?

liteonandy avatar Dec 06 '19 06:12 liteonandy

@cong235 My yolov3-tiny-trt.prototxt: name: "Darkent2Caffe" input: "data" input_dim: 1 input_dim: 3 input_dim: 416 input_dim: 416

layer { bottom: "data" top: "layer1-conv" name: "layer1-conv" type: "Convolution" convolution_param { num_output: 16 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer1-conv" top: "layer2-maxpool" name: "layer2-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer2-maxpool" top: "layer3-conv" name: "layer3-conv" type: "Convolution" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer3-conv" top: "layer4-maxpool" name: "layer4-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer4-maxpool" top: "layer5-conv" name: "layer5-conv" type: "Convolution" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer5-conv" top: "layer6-maxpool" name: "layer6-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer6-maxpool" top: "layer7-conv" name: "layer7-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer7-conv" top: "layer8-maxpool" name: "layer8-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer8-maxpool" top: "layer9-conv" name: "layer9-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer9-conv" top: "layer10-maxpool" name: "layer10-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer10-maxpool" top: "layer11-conv" name: "layer11-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer11-conv" top: "layer12-maxpool" name: "layer12-maxpool" type: "Pooling" pooling_param { stride: 1 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer12-maxpool" top: "layer13-conv" name: "layer13-conv" type: "Convolution" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer13-conv" top: "layer14-conv" name: "layer14-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer14-conv" top: "layer15-conv" name: "layer15-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer15-conv" top: "layer16-conv" name: "layer16-conv" type: "Convolution" convolution_param { num_output: 255 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer14-conv" top: "layer18-route" name: "layer18-route" type: "Concat" } layer { bottom: "layer18-route" top: "layer19-conv" name: "layer19-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer19-conv" top: "layer20-upsample" name: "layer20-upsample" type: "Upsample" #upsample_param { # scale: 2 #} } layer { bottom: "layer20-upsample" bottom: "layer9-conv" top: "layer21-route" name: "layer21-route" type: "Concat" } layer { bottom: "layer21-route" top: "layer22-conv" name: "layer22-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer22-conv" top: "layer23-conv" name: "layer23-conv" type: "Convolution" convolution_param { num_output: 255 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer16-conv" bottom: "layer23-conv" top: "yolo-det" name: "yolo-det" type: "Yolo" }

liteonandy avatar Dec 06 '19 06:12 liteonandy

@cong235 Hello, My yolov3-tiny always reports an error: ####### input args####### C=3; H=416; W=416; caffemodel=./yolov3-tiny.caffemodel; calib=; cam=0; class=80; classname=coco.name; display=1; evallist=; input=0; inputstream=cam; mode=fp32; nms=0.450000; outputs=yolo-det; prototxt=./yolov3-tiny-trt.prototxt; savefile=result; saveimg=0; videofile=sample.mp4; ####### end args####### init plugin proto: ./yolov3-tiny-trt.prototxt caffemodel: ./yolov3-tiny.caffemodel Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can you give me your yolov3-tiny?

I am having the same issue. I get the error: Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis (0), but dimensions mismatched at input 1 at index 1. Input 0 shape: [128,24,24], Input 1 shape: [256,26,26] runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can any one help me with this problem ??

sandeepjangir07 avatar Dec 19 '19 10:12 sandeepjangir07

@sandeepjangir07 You should try with 1 simple model you get from yolov3 homepage. And please confirm your yolov3-tiny-trt.prototxt was added new layer 'yolo'

uname0x96 avatar Dec 23 '19 01:12 uname0x96

@cong235 Hi, you mentioned that we can run int8 mode on tx2 and it's faster than fp16,However I set the mode to int8 and caliberate with 30 pictures, I got almost the same speed on xavier for yolo-tiny (about 85fps). I want to know why and is there a way to force the weights to int8 without caliberation? I can see that the int8 engine is just a little smaller than fp16 engine which means there's nearely no difference between my engines.

zeyuDai2018 avatar Jan 15 '20 09:01 zeyuDai2018

@zeyuDai2018 Yes, that is just in your engine, not another bro. Ex: You can processing 2 operation bellow: If you have 2 nodes with weight: 2.0 and 8.0 FP16: 2.0 * 8.0 = 16.0 INT8: 2 * 8 = 16 The result is the same.

But if your weight is 2.1 and 8.1: FP16: 2.1 * 8.1 = 17.01 INT8: 2 * 8 = 16 The result is difference.

I mean, INT8 is lighter than FP16 but it's not mean INT8 faster than FP16. Faster or not is dependent on your model and your weight.

uname0x96 avatar Jan 16 '20 08:01 uname0x96