Zero Zeng comments

Results 582 comments of


                                            Zero Zeng

[N:M Sparsity] No Gemm/MatMul eligible for sparse math

@nvpohanh Any comments here? I think we support sparsity for GEMM, and I've seen other users ask about it too, but I don't have a certain answer.

[N:M Sparsity] No Gemm/MatMul eligible for sparse math

1) we only support sparsity GEMM if it can be converted to a Conv(e.g. the second input must be a constant and only has 2 dimensions) 2) if the GEMM...

[N:M Sparsity] No Gemm/MatMul eligible for sparse math

> 1. Hence, is the 2:4 sparse NeMo BERT-Large SQuAD v1.1 result reported in https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ only simulated in ASP, but not actually measured when the model is compiled with TRT?...

[N:M Sparsity] No Gemm/MatMul eligible for sparse math

> 2\. When a transformer structure is detected, would TRT always offload the model to Myelin without any kernel profiling, even if no-Myelin might have better performance? I think yes...

TensorRT INT8 calibration python API

> can I use the same scripts to first generate a quatify with int8 calibrated engine and second run the validation to any classification model for example resnet18, squeezenet, etc…...

TensorRT INT8 calibration python API

> Is that normal that we have a drop of 3% accuracy from full precision FP32 to INT8? It might be expected since the accuracy didn't drop much, you can...

TensorRT INT8 calibration python API

> I am using the `eval_coco.py` to run the validation and get this. Maybe you just can't apply efficientdet's eval scripts to yolov5 directly. Please solve it on your own...

TensorRT INT8 calibration python API

> So what is the difference between a fake calibration cache and a not fake calibration cache?. the fake calibration cache is only used for test the performance, to get...

What is allocation memory and why the engine need to use so large activation memory?

@nvpohanh any suggestion here? I would suspect this is due to the model itself and the large input resolution.

build int8 engine failed.

You didn't specify the INT8 config flag. please refer to https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.BuilderFlag and check our documentation carefully.