TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Is any plan to support Dual Chunk Attention (DCA)?

Open gaoteng-git opened this issue 7 months ago • 1 comments

DCA is an effective method to run long input context inference. Does TensorRT-LLM plan to support this feature?

gaoteng-git avatar Jul 05 '24 06:07 gaoteng-git