tkDNN
tkDNN copied to clipboard
Add a python demo
@ioir123ju could you update Readme.md to make your work more clear?
@thancaocuong I have updated readme.md.
@ioir123ju currently is this correct if I say that we only can run with batchsize 1 with python binding?
@thancaocuong Yes, I did not complete the batch processing,but It's not difficult.You can try.
@ioir123ju did you get same speed with this python wrapper? I know python can add overhead, I don't know how much this could be. With python I'm getting 36 FPS with yolov4 FP16 and input size of 416x416 on a XAvier AGX, whilst with the ./demo
implementation I get 41 FPS. Did you get similar downgrade of performance?
@ ioir123ju使用此python包装器获得了相同的速度吗?我知道python可能会增加开销,但我不知道这可能会增加多少。使用python,我在yolov4 FP16上获得36 FPS,在XAvier AGX上输入大小为416x416,而在
./demo
实现上,我获得41 FPS。您是否得到了类似的性能降级?
At almost the same speed
@ioir123ju can you please add yolov4-csp plugins too
@ioir123ju am I right in thinking that to avoid memory leaks, a similar function to https://github.com/AlexeyAB/darknet/blob/master/src/network.c#L971 would be needed?
Hi! can you please add support for tensorrt8?
there is support for tensorrt8 ,checkout the tensorrt8 branch https://github.com/ceccocats/tkDNN/tree/tensorrt8..if you are asking for python support for tensorrt8 in tkDNNi am not sure about it
@perseusdg
Thank you, I have already checked out to the 'tensorrt8' branch and now facing some compatability issues with the python binding.
I have modified load_network
function in darknetTR.cpp according to new tk::dnn::Yolo3Detection
interface:
tk::dnn::Yolo3Detection* load_network(char* net_cfg, char* cfg_path, char* name_path, int n_classes, int n_batch, float conf_thresh)
{
std::string net;
net = net_cfg;
tk::dnn::Yolo3Detection *detNN = new tk::dnn::Yolo3Detection;
detNN->init(net, cfg_path, name_path, n_classes, n_batch, conf_thresh);
return detNN;
}
also modified header file and input argtypes on python side accordingly:
load_network.argtypes = [c_char_p, c_char_p, c_char_p, c_int, c_int, c_float]
In the meantime I'm reaching segfault while running the demo python script:
$ python darknetTR.py build/yolo4_fp16.rt --video=./demo/stabilized.mp4
build/yolo4_fp16.rt
New NetworkRT (TensorRT v8.01)
Float16 support: 1
Int8 support: 1
DLAs: 2
Segmentation fault (core dumped)
Any suggestions how to go on from here?
any update here? I face the same problem. on jetpack4.5.1, ioir123jupython interface can not run correctly