Kevin Chen
Kevin Chen
Are you able the share the model?
This is a unfortunately a known limitation in the Einsum Layer in TRT - we only support floating-point types for Einsum equations. Do you know which operation this Einsum equation...
It's possible that one of the inputs are being interpreted incorrectly as INT32. Can you provide the converted .onnx model?
What version of TRT are yo using? Are you able to share the model?
It looks like you are using the opset 13 version of Unsqueeze which is currently unsupported. Can you try exporting your model to a lower opset (i.e. opset 11)?
Which protobuf version are you using?
What TRT version are you using? Are you able to provide the models you are benchmarking?
Yes, proving the models in ONNX form will be useful. Are you seeing the same performance difference with the latest version of TRT?
I recommend using the tools specified in https://github.com/onnx/onnx-tensorrt/blob/master/docs/faq.md#how-do-i-import-and-run-an-onnx-model-through-tensorrt for generating TensroRT engines from ONNX models since those tools are more supported.
You can set `use_cache=False` for now. The kv cache feature is not fully supported in notebooks yet. We'll add updated notebooks supporting this feature in one of our next releases.