TensorRT-LLM
TensorRT-LLM copied to clipboard

Published 20 hours ago •

Reame
Issues

Is any plan to support Dual Chunk Attention (DCA)?

Open gaoteng-git opened this issue 7 months ago • 1 comments

DCA is an effective method to run long input context inference. Does TensorRT-LLM plan to support this feature?

Jul 05 '24 06:07 gaoteng-git