libtorch-yolov3
libtorch-yolov3 copied to clipboard
How faster is it than the pure python code?
@acgtyrant I tested it on P40, fps is 46.
Thanks for your amazing work! And how about the speed in cpu? As I know, opencv 3.4.2 has supported yolov3 inference with the speed of 220 ms on an 12x cortex i7-6850 cpu. Does this run faster? opencv perfermence can see here: https://hk.saowen.com/a/8c0f58aa3914c3bef46fb29eb40c77522b25fd7c0672fc9eadb2b3fdc2a8fbfb
@zhangjinsong3 I tested this on CPU(E5-2620 v4, 16 cores), it costs 1.89s. Looks like opencv's dnn implementation is much more faster.
I did not test on CPU, thanks @zacario-li
@walktree Thanks for your work! I test this on GPU(1080),it costs 800ms, on CPU(i7-7700k), it costs 3000ms. Is it wrong?
Not sure how you guys test, but I got 25 ms on GPU ( 1080ti ).
There is a comment in the code: "It should be known that it takes longer time at first time". please run inference job more than once, and calculate the average cost.
e.g.
for (int t = 0; t < 10; t++) { auto start = std::chrono::high_resolution_clock::now();
auto output = net.forward(img_var);
// filter result by NMS
// class_num = 80
// confidence = 0.6
auto result = net.write_results(output, 80, 0.6, 0.4);
auto end = std::chrono::high_resolution_clock::now();
auto duration = duration_cast<milliseconds>(end - start);
// It should be known that it takes longer time at first time
std::cout << "inference taken : " << duration.count() << " ms" << endl;
if (result.dim() == 1)
{
std::cout << "no object found" << endl;
}
else
{
int obj_num = result.size(0);
std::cout << obj_num << " objects found" << endl;
// save as image
if (t == 0)
{
float w_scale = float(origin_image.cols) / input_image_size;
float h_scale = float(origin_image.rows) / input_image_size;
result.select(1,1).mul_(w_scale);
result.select(1,2).mul_(h_scale);
result.select(1,3).mul_(w_scale);
result.select(1,4).mul_(h_scale);
auto result_data = result.accessor<float, 2>();
for (int i = 0; i < result.size(0) ; i++)
{
cv::rectangle(origin_image, cv::Point(result_data[i][1], result_data[i][2]), cv::Point(result_data[i][3], result_data[i][4]), cv::Scalar(0, 0, 255), 1, 1, 0);
}
cv::imwrite("out-det.jpg", origin_image);
}
}
}
thank you! i test again, about 35 ms
i compiled the libtorch source code on NVIDIA TX2 with ARM architecture. when I made your YOLO3-libtorch code, error comes that "Darknet.cpp:577:12: error: 'class torch::nn::BatchNormImpl' has no member named 'running_variance' ". how could i do? thx
i compiled the libtorch source code on NVIDIA TX2 with ARM architecture. when I made your YOLO3-libtorch code, error comes that "Darknet.cpp:577:12: error: 'class torch::nn::BatchNormImpl' has no member named 'running_variance' ". how could i do? thx
oh...I checked the batchnorm.h and got the ideals