TensorRT
TensorRT copied to clipboard
Implement embedding bag convertor
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-loops https://github.com/pytorch/pytorch/blob/main/torch/nn/functional.py#L2368-L2391 https://github.com/pytorch/pytorch/issues/25469
Helpful links
- 1D input with DD-offsets (ITensor) -> TensorRT logic directly (ILoop)
- 2D input -> flattening, generating the offsets on the fly, then running 1D and reshape
- Undefined 2D behavior, unclear what happens in this case in PyT
https://github.com/pytorch/pytorch/issues/25469
TensorRT team just replied: "There's a known bug 4411383 where a network with DDS but not DS requires an optimization profile. For that bug, the easy way to avoid it in 9.2 was to enable profile sharing (PROFILE_SHARING_0806)."
https://github.com/pytorch/pytorch/blob/8ca8729321a8c858c6bc33318ce2b80b8a5c900e/torch/onnx/symbolic_opset11.py#L1309