TensorRT
TensorRT copied to clipboard
How does TensorRT leverage attention masks to speed up inference ?
Hello team,
Thanks for all the great work,
I am training a model where I am providing tile-wise constant attention masks (see picture below). At inference time, how will TensorRT leverage this type of attention mask to speed up inference ?