fairseq2 torch.compile + CUDA Graph optimization for bs=1

torch.compile + CUDA Graph optimization for bs=1

Open YJYJLee opened this issue 5 months ago • 0 comments

PR request for Pytorch blog post.

Summary: This post is the fourth part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. In this blog, we’ll focus on speeding up FAIR’s Seamless M4T-v2 model resulting in 2x speedup for text decoder module and 30x for vocoder module, resulting in 2.7x speedup for end-to-end inference, with no loss of accuracy by using CUDA Graph and native PyTorch optimization: torch.compile.

End-to-end Inference Speedup

Jan 18 '24 23:01 YJYJLee

fairseq2 fairseq2 copied to clipboard

torch.compile + CUDA Graph optimization for bs=1

fairseq2
fairseq2 copied to clipboard