FastDeploy
FastDeploy copied to clipboard
[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports
Motivation
该 PR 旨在实现两个目标:
- 简化 PD 分离的部署流程和参数,包括端口号配置、RDMA 网卡检测、相关环境变量设置等等工序,实现【启动 Router】→【启动 P&D 实例】→【部署完成】的简易部署流程。其中启动参数的简化也期望适用于集中式部署和多 TP/DP 部署,并兼容通过 APIServer 和 MultiAPIServer 多 DP 服务的启动方式。
- 重构当前代码中与端口号相关的配置处理和使用逻辑。在参数初始化时,若用户未指定端口号,自动寻找可用端口,需要支持在线服务和离线接口;在多 DP 部署场景,在配置初始化时切分好各 DP 所需的端口号,而不是在使用时才临时切分。尽量实现配置的静态化、只读化,减少运行时的配置更改。
Modifications
- ArgsUtils
- 新增 post_init_all_ports 参数后处理和检查流程,在 EngineArgs 初始化时,会检查用户传入的各类端口号数量是否正确。如果用户未传入端口号,会自动为用户分配所需数量的端口号。
- FDConfig
- 去除旧的不规范的类型转换逻辑,在 config 初始化时会用 parse_ports() 方法统一将 *_port 类变量转成 list[int] 类型
- 🌟 新增 local_* 类端口号变量,包括 local_engine_worker_queue_port (int), local_rdma_comm_ports (list[int]), local_pd_comm_port (int), local_cache_queue_port (int),在 DP/EP 场景用来指代当前 DP 使用的端口号,非 DP/EP 场景也统一使用 local_* 类的端口号变量
- 🌟 新增 postprocess_devices_and_ports 的 config 后处理流程,在 FDConfig.postprocess 中,会统一为 local_* 类端口号变量赋值,切分出当前 DP 所需的端口号,不建议在 config.py 以外的模块修改 FDConfig 对象
- MultiAPIServer
- 新增参数检查流程,如果用户未传入端口号,会自动为用户分配所需数量的端口号;
- 如果用户传入的端口号数量不正确,会重新为用户分配所需数量的端口号
- 默认设置 FD_ENABLE_MULTI_API_SERVER 环境变量
- Cache
- CacheTransferManager & CacheMessager 接收的参数名 engine_pid 修改为 ipc_suffix ,更贴合语义
- 🌟 RDMACacheTransfer 新增初始化代码,自动设置 KVCACHE_RDMA_NICS, KVCACHE_GDRCOPY_FLUSH_ENABLE 环境变量
- CommonEngine
- 去除部分端口号列表切分、类型转换的逻辑(已经移到 config 层处理)
- 修改使用的 llm_logger 对象,如果是 DP 场景应该将日志写入 _dprank*.log 文件
- Utils
- 新增端口号检测、解析和自动寻找可用端口号的工具函数
- Examples
- 简化 start_v1_dp2.sh 和 start_v1_tp1.sh 的启动命令,新增用例 start_v1_tp2.sh,并优化 utils.sh 中的工具函数
- Others
- 🌟 将所有端口号变量都配套修改为使用 local_* 类端口号变量
- 兼容使用 api server 启动多 DP 的方式
- 去除部分 DP 逻辑的 EP 限制
- 在 DP0 创建 DP1-N 时,深度拷贝当前 DP0 的 cfg 给各个 DP,避免 DP1-N 内部有修改 config 的操作互相干扰
- 在 ExpertService 初始化时,根据 local_data_parallel_id 重写当前 DP 的部分配置
- 🌟 让每个 DP 都创建一个 EngineCacheQueue 服务,而不是所有 DP 共享一个,与 EngineWorkerQueue 的架构对齐
Usage or Command
bash examples/splitwise/start_v1_dp2.sh
Accuracy Tests
$ bash examples/splitwise/start_v1_dp2.sh
ROUTER_PORT: 8274
nohup: redirecting stderr to stdout
P_SERVER_PORTS: 8629,8631
nohup: redirecting stderr to stdout
-------- WAIT FOR HEALTH --------
Port 8629: [OK] 200
Port 8631: [OK] 200
All services are ready! [38s]
---------------------------------
D_SERVER_PORTS: 8534,8535
nohup: redirecting stderr to stdout
-------- WAIT FOR HEALTH --------
Port 8534: [OK] 200
Port 8535: [OK] 200
All services are ready! [37s]
---------------------------------
{"id":"chatcmpl-3dcecc9a-cf59-45a1-876a-e66535c6ef23","object":"chat.completion","created":1765535525,"model":"/root/paddlejob/workspace/env_run/output/models/ERNIE-4.5-0.3B-Paddle/","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?","multimodal_content":null,"reasoning_content":null,"audio_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"draft_logprobs":null,"prompt_logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":8,"total_tokens":18,"completion_tokens":10,"prompt_tokens_details":{"cached_tokens":0,"image_tokens":0,"video_tokens":0},"completion_tokens_details":{"reasoning_tokens":0,"image_tokens":0}}}
Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]] - You can add new tags based on the PR content, but the semantics must be clear.
- Tag list: [
- [x] Format your code, run
pre-commitbefore commit. - [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the
releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.
Thanks for your contribution!
Codecov Report
:x: Patch coverage is 68.51852% with 85 lines in your changes missing coverage. Please review.
:warning: Please upload report for BASE (develop@c9b47f9). Learn more about missing BASE report.
Additional details and impacted files
@@ Coverage Diff @@
## develop #5415 +/- ##
==========================================
Coverage ? 62.08%
==========================================
Files ? 329
Lines ? 41287
Branches ? 6295
==========================================
Hits ? 25633
Misses ? 13701
Partials ? 1953
| Flag | Coverage Δ | |
|---|---|---|
| GPU | 62.08% <68.51%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.