Xiaodong (Vincent) Huang
Xiaodong (Vincent) Huang
Hello @zhaohb the `output[] = input[]` is also copy, this implementation would not faster than native tensorRT op, and the `copypackedkernel` is used not only for slice backend, we need...
Hello @zhaohb , Yes I need the log of the engine generating, and I see yoursharing https://drive.google.com/file/d/1jWcwHhHFpZ0qiRUIwA54qa7MSLL6BK9a/view?usp=sharing is the binary engine plan, not the text log that redirect from console,...
Hello @zhaohb , log received, I did not see the slice is replace by your plugin from the log? Could you send me the log that your replace the `slice`...
Hello @zhaohb , the log file is OK, each `slice` is implemented with a `copyPackedKernel`. The log should not be the same if you replace the native `slice` with plugin....
@zhaohb , could you show me the build log?
@zhaohb , if you implement the slice in the previous embedding. why there is still copying?
@zhaohb ,you need change the plugin implementation, else you cannot avoid launch of extra kernel to do the copy operation.
@Dsqds how did you generate the onnx, is it calibrated? thanks!
@Dsqds , maybe I miss something here, I was checking the code https://github.com/Dsqds/pytorch-cifar100/blob/master/3-pytorch_quantization2onnx.py, it is not there. could you follow https://github.com/NVIDIA/TensorRT/blob/main/tools/pytorch-quantization/examples/torchvision/classification_flow.py#L357 In the code we call `enable_calib` before run the...
Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!