TensorRT How does TensorRT leverage attention masks to speed up inference ?

How does TensorRT leverage attention masks to speed up inference ?

Open MatthieuToulemont opened this issue 7 months ago • 4 comments

Hello team,

Thanks for all the great work,

I am training a model where I am providing tile-wise constant attention masks (see picture below). At inference time, how will TensorRT leverage this type of attention mask to speed up inference ?

Apr 04 '25 08:04 MatthieuToulemont

TensorRT TensorRT copied to clipboard

How does TensorRT leverage attention masks to speed up inference ?

TensorRT
TensorRT copied to clipboard