[Cherry-Pick][RL] R3 Support RDMA Store
Motivation
:bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)
:bulb: 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)
Performance comparison between RoutingStoreLocal and RoutingStoreRDMA:
1. paddle.load overhead:
Number of successfully `get` files: 37/37
Mean overhead: 0.0395 s
Min overhead: 0.0127 s
Max overhead: 0.1590 s
Total overhead: 1.4610 s
2. paddle.save overhead:
Number of successfully `save` files: 37/37
Mean overhead: 0.0872 s
Min overhead: 0.0692 s
Max overhead: 0.1226 s
Total overhead: 3.2273 s
3. p2pstore.put overhead:
Number of successfully `put` files: 37/37
Mean overhead: 0.0073 s
Min overhead: 0.0063 s
Max overhead: 0.0076 s
Total overhead: 0.2691 s
4. p2pstore.get overhead:
Number of successfully `get` files: 37/37
Mean overhead: 0.0027 s
Min overhead: 0.0027 s
Max overhead: 0.0029 s
Total overhead: 0.1008 s
release/2.4 PR: https://github.com/PaddlePaddle/FastDeploy/pull/5468 develop PR: https://github.com/PaddlePaddle/FastDeploy/pull/5467
Modifications
Add RoutingStoreRDMA, using P2P communication to transmit routing.
- Routing will be stored in the
WorkerProcessprocess where theRoutingStoreRDMAis located and will not be actively released. - The
p2pstoredependency library and 'RoutingStoreRDMA' can only be used in RLHF of PaddlePaddle
Usage or Command
Add new parameters for RoutingReplayConfig:
--routing-replay-config '{"enable_routing_replay":true, "routing_store_type":"rdma", "rdma_store_server":"zmq://x.x.x.x:5765,x.x.x.x:5766"}'
Accuracy Tests
Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]] - You can add new tags based on the PR content, but the semantics must be clear.
- Tag list: [
- [x] Format your code, run
pre-commitbefore commit. - [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the
releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.
Thanks for your contribution!