TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

How does TensorRT leverage attention masks to speed up inference ?

Open MatthieuToulemont opened this issue 7 months ago • 4 comments

Hello team,

Thanks for all the great work,

I am training a model where I am providing tile-wise constant attention masks (see picture below). At inference time, how will TensorRT leverage this type of attention mask to speed up inference ?

Image

MatthieuToulemont avatar Apr 04 '25 08:04 MatthieuToulemont