Zero Zeng
Zero Zeng
A suggestion: after constant folding, the network structure is simpler:  ``` polygraphy surgeon sanitize module.onnx --fold-constants -o module_folded.onnx ```
> Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by...
It's in https://github.com/NVIDIA/TensorRT/blob/main/tools/experimental/trt-engine-explorer/requirements.txt
@pranavm-nvidia maybe we should add a `pip install -r requirements.txt' in https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer#installation
I can not reproduce this on official docker image: nvcr.io/nvidia/tensorrt:22.07-py3 ``` [08/15/2022-12:40:45] [I] Starting inference [08/15/2022-12:40:48] [I] The e2e network timing is not reported since it is inaccurate due to...
510.47.03, I would suggest just using the latest version.
It's expected these layer can not run on DLA because DLA doesn't support them. you can find DLA support layer in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers
What I know is usually we only get about 15%(usually less) e2e perf improvement with sparsity enabled for most of the CNN-based models. sometimes the speedup is not obvious because...
I think what you see might be expected. I've done some perf tests with some public models and most of them get perf improvment of less than 10%(especially when the...
> May I know what kinds of optimizations are present in the dense kernels that make them better than their sparse counterpart? Lots of :-) > If there is very...