Zero Zeng
Zero Zeng
@nvpohanh Any comments here? I think we support sparsity for GEMM, and I've seen other users ask about it too, but I don't have a certain answer.
1) we only support sparsity GEMM if it can be converted to a Conv(e.g. the second input must be a constant and only has 2 dimensions) 2) if the GEMM...
> 1. Hence, is the 2:4 sparse NeMo BERT-Large SQuAD v1.1 result reported in https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ only simulated in ASP, but not actually measured when the model is compiled with TRT?...
> 2\. When a transformer structure is detected, would TRT always offload the model to Myelin without any kernel profiling, even if no-Myelin might have better performance? I think yes...
> can I use the same scripts to first generate a quatify with int8 calibrated engine and second run the validation to any classification model for example resnet18, squeezenet, etc…...
> Is that normal that we have a drop of 3% accuracy from full precision FP32 to INT8? It might be expected since the accuracy didn't drop much, you can...
> I am using the `eval_coco.py` to run the validation and get this. Maybe you just can't apply efficientdet's eval scripts to yolov5 directly. Please solve it on your own...
> So what is the difference between a fake calibration cache and a not fake calibration cache?. the fake calibration cache is only used for test the performance, to get...
@nvpohanh any suggestion here? I would suspect this is due to the model itself and the large input resolution.
You didn't specify the INT8 config flag. please refer to https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.BuilderFlag and check our documentation carefully.