libtorch-yolov3 How faster is it than the pure python code?

Jan 07 '19 08:01 acgtyrant

@acgtyrant I tested it on P40, fps is 46.

Jan 18 '19 08:01 zacario-li

Thanks for your amazing work! And how about the speed in cpu? As I know, opencv 3.4.2 has supported yolov3 inference with the speed of 220 ms on an 12x cortex i7-6850 cpu. Does this run faster? opencv perfermence can see here: https://hk.saowen.com/a/8c0f58aa3914c3bef46fb29eb40c77522b25fd7c0672fc9eadb2b3fdc2a8fbfb

Jan 19 '19 09:01 zhangjinsong3

@zhangjinsong3 I tested this on CPU(E5-2620 v4, 16 cores), it costs 1.89s. Looks like opencv's dnn implementation is much more faster.

Jan 21 '19 01:01 zacario-li

I did not test on CPU, thanks @zacario-li

Jan 21 '19 02:01 walktree

@walktree Thanks for your work! I test this on GPU(1080)，it costs 800ms, on CPU（i7-7700k）, it costs 3000ms. Is it wrong?

Mar 11 '19 03:03 RainAI

Not sure how you guys test, but I got 25 ms on GPU ( 1080ti ).

There is a comment in the code: "It should be known that it takes longer time at first time". please run inference job more than once, and calculate the average cost.

e.g.

for (int t = 0; t < 10; t++) { auto start = std::chrono::high_resolution_clock::now();

auto output = net.forward(img_var);

// filter result by NMS 
// class_num = 80
// confidence = 0.6
auto result = net.write_results(output, 80, 0.6, 0.4);

auto end = std::chrono::high_resolution_clock::now();

auto duration = duration_cast<milliseconds>(end - start); 

// It should be known that it takes longer time at first time
std::cout << "inference taken : " << duration.count() << " ms" << endl; 

if (result.dim() == 1)
{
    std::cout << "no object found" << endl;
}
else
{
    int obj_num = result.size(0);

    std::cout << obj_num << " objects found" << endl;

    // save as image
    if (t == 0)
    {
        float w_scale = float(origin_image.cols) / input_image_size;
        float h_scale = float(origin_image.rows) / input_image_size;

        result.select(1,1).mul_(w_scale);
        result.select(1,2).mul_(h_scale);
        result.select(1,3).mul_(w_scale);
        result.select(1,4).mul_(h_scale);

        auto result_data = result.accessor<float, 2>();

        for (int i = 0; i < result.size(0) ; i++)
        {
            cv::rectangle(origin_image, cv::Point(result_data[i][1], result_data[i][2]), cv::Point(result_data[i][3], result_data[i][4]), cv::Scalar(0, 0, 255), 1, 1, 0);
        }

        cv::imwrite("out-det.jpg", origin_image);
    }
}

}

Mar 11 '19 08:03 walktree

thank you! i test again, about 35 ms

Mar 11 '19 09:03 RainAI

i compiled the libtorch source code on NVIDIA TX2 with ARM architecture. when I made your YOLO3-libtorch code, error comes that "Darknet.cpp:577:12: error: 'class torch::nn::BatchNormImpl' has no member named 'running_variance' ". how could i do? thx

Mar 20 '19 06:03 nsjiow

i compiled the libtorch source code on NVIDIA TX2 with ARM architecture. when I made your YOLO3-libtorch code, error comes that "Darknet.cpp:577:12: error: 'class torch::nn::BatchNormImpl' has no member named 'running_variance' ". how could i do? thx

oh...I checked the batchnorm.h and got the ideals

Mar 20 '19 06:03 nsjiow

libtorch-yolov3 libtorch-yolov3 copied to clipboard

How faster is it than the pure python code?

libtorch-yolov3
libtorch-yolov3 copied to clipboard