TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Embedding_bag operator on GPU

Open rishucoding opened this issue 2 years ago • 7 comments

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

  • Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?

  • What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Please let me know your comments. Thanks

rishucoding avatar Sep 13 '23 15:09 rishucoding

@nvpohanh ^ ^

zerollzeng avatar Sep 17 '23 12:09 zerollzeng

For Gather operation, TRT generates the kernel dynamically and tries to fuse it with other pointwise operations if possible. That means, we do not use the same Gather kernels as PyTorch does.

nvpohanh avatar Sep 18 '23 03:09 nvpohanh

What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Our MLPerf-Inference submission uses TensorRT for the DLRM benchmark: https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA

Using TensorRT allows more aggressive fusions like Gemm+Pointwise fusions.

nvpohanh avatar Sep 18 '23 03:09 nvpohanh

closing since no activity for more than 3 weeks, thanks all!

ttyio avatar Oct 10 '23 20:10 ttyio

Thanks @nvpohanh for the comments. Could you share the source code for TRT implementation of Gather Kernel used in Embedding Stage for DLRMs? Also, could you compare the TRT gather kernel with the PyTorch Embedding Stage CUDA kernel (link)

rishucoding avatar Feb 08 '24 18:02 rishucoding

@nvpohanh ^ ^

zerollzeng avatar Feb 13 '24 07:02 zerollzeng

Hi -- could you please share your comments on my follow-up question? Thanks.

rishucoding avatar Jun 16 '24 00:06 rishucoding