TensorRT-LLM
TensorRT-LLM copied to clipboard
Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration
such as https://github.com/vllm-project/vllm/pull/2809 and https://github.com/LLMServe/DistServe that had done
reference:https://arxiv.org/pdf/2311.18677