Xiaodong (Vincent) Huang comments

Results 283 comments of


                                            Xiaodong (Vincent) Huang

CopyPackedKernel is taking too long, and how to optimize it

Hello @zhaohb the `output[] = input[]` is also copy, this implementation would not faster than native tensorRT op, and the `copypackedkernel` is used not only for slice backend, we need...

CopyPackedKernel is taking too long, and how to optimize it

Hello @zhaohb , Yes I need the log of the engine generating, and I see yoursharing https://drive.google.com/file/d/1jWcwHhHFpZ0qiRUIwA54qa7MSLL6BK9a/view?usp=sharing is the binary engine plan, not the text log that redirect from console,...

CopyPackedKernel is taking too long, and how to optimize it

Hello @zhaohb , log received, I did not see the slice is replace by your plugin from the log? Could you send me the log that your replace the `slice`...

CopyPackedKernel is taking too long, and how to optimize it

Hello @zhaohb , the log file is OK, each `slice` is implemented with a `copyPackedKernel`. The log should not be the same if you replace the native `slice` with plugin....

CopyPackedKernel is taking too long, and how to optimize it

@zhaohb , could you show me the build log?

CopyPackedKernel is taking too long, and how to optimize it

@zhaohb , if you implement the slice in the previous embedding. why there is still copying?

CopyPackedKernel is taking too long, and how to optimize it

@zhaohb ,you need change the plugin implementation, else you cannot avoid launch of extra kernel to do the copy operation.

some stange error when using pytorch_quantization for googlenet

@Dsqds how did you generate the onnx, is it calibrated? thanks!

some stange error when using pytorch_quantization for googlenet

@Dsqds , maybe I miss something here, I was checking the code https://github.com/Dsqds/pytorch-cifar100/blob/master/3-pytorch_quantization2onnx.py, it is not there. could you follow https://github.com/NVIDIA/TensorRT/blob/main/tools/pytorch-quantization/examples/torchvision/classification_flow.py#L357 In the code we call `enable_calib` before run the...

some stange error when using pytorch_quantization for googlenet

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!