Zero Zeng

Results 582 comments of Zero Zeng

A suggestion: after constant folding, the network structure is simpler: ![image](https://user-images.githubusercontent.com/38289304/185964831-f3d48890-5daa-40dc-b91a-16158be915b6.png) ``` polygraphy surgeon sanitize module.onnx --fold-constants -o module_folded.onnx ```

> Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by...

It's in https://github.com/NVIDIA/TensorRT/blob/main/tools/experimental/trt-engine-explorer/requirements.txt

@pranavm-nvidia maybe we should add a `pip install -r requirements.txt' in https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer#installation

I can not reproduce this on official docker image: nvcr.io/nvidia/tensorrt:22.07-py3 ``` [08/15/2022-12:40:45] [I] Starting inference [08/15/2022-12:40:48] [I] The e2e network timing is not reported since it is inaccurate due to...

It's expected these layer can not run on DLA because DLA doesn't support them. you can find DLA support layer in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers

What I know is usually we only get about 15%(usually less) e2e perf improvement with sparsity enabled for most of the CNN-based models. sometimes the speedup is not obvious because...

I think what you see might be expected. I've done some perf tests with some public models and most of them get perf improvment of less than 10%(especially when the...

> May I know what kinds of optimizations are present in the dense kernels that make them better than their sparse counterpart? Lots of :-) > If there is very...