TensorRT-LLM Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration

Can tensorrt-llm or how tensorrt-llm support that seprating the prefill stage and decode stage in different GPU or different nodes with self configuration

Open GGBond8488 opened this issue 5 months ago • 2 comments

such as https://github.com/vllm-project/vllm/pull/2809 and https://github.com/LLMServe/DistServe that had done

reference:https://arxiv.org/pdf/2311.18677

Sep 18 '24 03:09 GGBond8488