TensorRT-LLM
TensorRT-LLM copied to clipboard
Is any plan to support Dual Chunk Attention (DCA)?
DCA is an effective method to run long input context inference. Does TensorRT-LLM plan to support this feature?