PD分离部署DeepSeeK-R1-FP8模型,起tp16卡的prefill服务报错
[Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 0 [Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 1 [Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 2 [Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 3 [Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 4 [Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 5 [Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 6 [Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15 INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 7 INFO 09-30 03:02:12 [manager.py:193] use req queue ChunkedPrefillQueue INFO 09-30 03:02:14 [cache_tensor_manager.py:17] USE_GPU_TENSOR_CACHE is On All deep_gemm operations loaded successfully! INFO 09-30 03:02:15 [init.py:216] Automatically detected platform cuda. WARNING 09-30 03:02:15 [light_utils.py:13] lightllm_kernel is not installed, you can't use the api of it. WARNING 09-30 03:02:16 [nixl_kv_transporter.py:19] nixl is not installed, which is required for pd disagreggation!!! Process Process-2:9: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 233, in _init_env manager = PrefillKVMoveManager(args, info_queue, mem_queues) File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 40, in init assert self.dp_world_size <= self.node_world_size AssertionError
@wenruihua 你可以分享下你的启动命令吗。
目前不能支持tp 18 模式下的pd 分离。 需要开 dp 18
@wenruihua 你可以分享下你的启动命令吗。
#pd_prefill_0.sh
export host=10.24.62.3
export pd_master_ip=10.24.62.3
export nccl_host=10.24.62.3
#nvidia-cuda-mps-control -d
LOADWORKER=18 python -m lightllm.server.api_server
--model_dir /mnt/model/DeepSeek-R1
--run_mode "prefill"
--tp 16
--host $host
--port 8019
--nnodes 2
--node_rank 0
--nccl_host $nccl_host
--nccl_port 2732
--enable_fa3
--disable_cudagraph
--pd_master_ip $pd_master_ip
--pd_master_port 8000
#pd_prefill_1.sh
export host=10.24.62.9
export pd_master_ip=10.24.62.3
export nccl_host=10.24.62.3
#nvidia-cuda-mps-control -d
LOADWORKER=18 python -m lightllm.server.api_server
--model_dir /mnt/model/DeepSeek-R1
--run_mode "prefill"
--tp 16
--host $host
--port 8019
--nnodes 2
--node_rank 1
--nccl_host $nccl_host
--nccl_port 2732
--enable_fa3
--disable_cudagraph
--pd_master_ip $pd_master_ip
--pd_master_port 8000
目前不能支持tp 18 模式下的pd 分离。 需要开 dp 18
我是tp16卡起的deepseek-R1-FP8模型,你的意思是给tp16改成dp16?
@wenruihua 目前支持不了tp16的 pd 分离,但是 dp 16 是可以做pd 分离的 --tp 16 --dp 16 , 然后加 MOE_MODE=EP 的环境变量。不知道你是什么显卡,如果是H20,可能还有deepep的定制适配问题。
不过dp 16 模式,prefill的性能的延迟是不是特别好的。