Zero Zeng comments

Results 582 comments of


                                            Zero Zeng

Error Code 4: Miscellaneous (IShuffleLayer Reshape_427: reshape changes volume. Reshaping [900,1,256] to [900,7200,32].)

A suggestion: after constant folding, the network structure is simpler: ![image](https://user-images.githubusercontent.com/38289304/185964831-f3d48890-5daa-40dc-b91a-16158be915b6.png) ``` polygraphy surgeon sanitize module.onnx --fold-constants -o module_folded.onnx ```

Error Code 4: Miscellaneous (IShuffleLayer Reshape_427: reshape changes volume. Reshaping [900,1,256] to [900,7200,32].)

> Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by...

Graphviz requirement for trt-engine-explorer

It's in https://github.com/NVIDIA/TensorRT/blob/main/tools/experimental/trt-engine-explorer/requirements.txt

Graphviz requirement for trt-engine-explorer

@pranavm-nvidia maybe we should add a `pip install -r requirements.txt' in https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer#installation

ScatterND or ScatterElements - Myelin error: myelinTargetSetPropertyMemorySize called with invalid memory size (0)

I can not reproduce this on official docker image: nvcr.io/nvidia/tensorrt:22.07-py3 ``` [08/15/2022-12:40:45] [I] Starting inference [08/15/2022-12:40:48] [I] The e2e network timing is not reported since it is inaccurate due to...

ScatterND or ScatterElements - Myelin error: myelinTargetSetPropertyMemorySize called with invalid memory size (0)

510.47.03, I would suggest just using the latest version.

sampleNMT on DLA

It's expected these layer can not run on DLA because DLA doesn't support them. you can find DLA support layer in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers

Performance of structured sparsity for Inceptionv3 on A6000 GPU

What I know is usually we only get about 15%(usually less) e2e perf improvement with sparsity enabled for most of the CNN-based models. sometimes the speedup is not obvious because...

Performance of structured sparsity for Inceptionv3 on A6000 GPU

I think what you see might be expected. I've done some perf tests with some public models and most of them get perf improvment of less than 10%(especially when the...

Performance of structured sparsity for Inceptionv3 on A6000 GPU

> May I know what kinds of optimizations are present in the dense kernels that make them better than their sparse counterpart? Lots of :-) > If there is very...