darknet-nnpack
darknet-nnpack copied to clipboard
Segmentation fault error
Hi, While detection on raspberry pi, I can run detection with 1.8 fps on yolov2-tiny-voc, but when trying with custom object trained weights, I am getting error like Segmentation fault.
A few questions: Does it work on regular Darknet on PC? Does it work with this repository when compiled for PC? and... is there a commandline and log where it fails?
If your custom network is as complex as full yolo2/yolo3, it won't work on the Pi. Fast convolutions need a lot of memory and you'll run out.
Hi, @shizukachan Thanks for reply, it is working now, actually problem was large size of batches(64) and subdivision, that raspberry cant handle, so I reduce it to batches =1 and subdivision= 1 and reduced image size to 256x256,
Do you have any suggetion for optimize cfg file for 2 classes to increase fps for raspberry pi?
You can measure the time it takes to run each layer and try reducing the number of filters in the most expensive ones. Those tend to be the last few layers.
Thank you very much @shizukachan , at first level your repo helps a lot to brake a barrier of 1 fps on raspberry pi, till now I have achieved 4 fps, and still working on it, I will make it public once it complete. Thank You.
The time-each-layer method is basically to take the time at the end of each convolutional layer and subtract it from the time at the beginning of each layer. The call I timed is the call to NNPACK or GEMM.
This allowed me to:
- determine which layer was running slowly on the QPU for QPU-darknet. Once I found that QPU MKL on a single layer runs longer than the entire network on the original NNPACK version, I stopped trying to optimize it.
- train a version of tiny-yolo v2 that specifically addressed that layer, by decreasing the filter count. On VOC this definitely hurts accuracy, but I was able to speed the network up 25% or so. For less classes, ymmv but it should not hurt you as much.
@shizukachan @ajdhole im getting segmentation fault . ive made batches and subdivison equal to 1 and change the height and width to 256 as well Im trying on a rpi3b+ with 2 classes . Any help would be really appreciated