TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration

Open GGBond8488 opened this issue 5 months ago • 2 comments

such as https://github.com/vllm-project/vllm/pull/2809 and https://github.com/LLMServe/DistServe that had done

reference:https://arxiv.org/pdf/2311.18677

GGBond8488 avatar Sep 18 '24 03:09 GGBond8488