Zero Zeng comments

Results 582 comments of


                                            Zero Zeng

How should I speed up T5 original exported saved_model by using TRT?

> Dose "export it to onnx" mean using coverted saved_model export to onnx? Yes BTW I used https://huggingface.co/docs/transformers/model_doc/t5 before and I think it's more convenient to export to onnx :-)

How should I speed up T5 original exported saved_model by using TRT?

{HF t5 exported + onnx} would be better IMHO, AFAIK when you deploy using TF-TRT, there are inevitable framework overheads introduced by the conversion of TF IR to TRT IR....

Stuck and raise Error Code 2: Internal Error (Assertion memSize >= 0 failed. )

Can you share the exported onnx model here? Thanks! At a first glance I thinks I can't export the onnx using your model due to below lines: ``` sys.path.append('/ssd1/xingyum/models/STTN') from...

Stuck and raise Error Code 2: Internal Error (Assertion memSize >= 0 failed. )

I can reproduce this and I've filed an internal bug to track it, thanks for reporting this.

Stuck and raise Error Code 2: Internal Error (Assertion memSize >= 0 failed. )

BTW if you want to WAR this in short term, would it be possible to use a static shape? e.g. ``` ./trtexec --onnx=alexnet.onnx --fp16 --optShapes=input_1:1x3x720x1280,input_2:1x1x720x1280 ``` works for me

Stuck and raise Error Code 2: Internal Error (Assertion memSize >= 0 failed. )

This will be fixed in the next major version. thanks again for reporting this :-）

Question: trtexec hardware during compilation vs production?

You need to build the engine in the same GPU.

TensorRT not supporting structured sparsity for Matrix Multiplication and Linear Layers?

TensorRT didn't pick the sparse implementation is because dense implementation is faster than sparse, and TRT will choose the fastest kernel. a similar thing happened in fp16 and int8. if...

Inference accuracy

@azhurkevich ^ ^

use Tensorrt to infer recognization model and get random values on Jetson Xavier NX

trtexec read the inputs as raw buffer, I'm not sure whether it will work if you use npy. If you don't specify --loadInputs, does it produce nan output? or can...