oneflow
oneflow copied to clipboard
Split plan push and pull
- [x] 并行启动线程组
- [x] 每个线程读大 plan(并行读,安全),写各自的小 plan(写线程局部变量,安全)
- [x] 小 plan 只包括特定 rank 需要的信息
- [x] 并行发送
可以附上优化效果吗?
可以附上优化效果吗?
只是先做了拆分、传递 sub graph,然后执行。
还没加上并行,加上并行后我测下效果。
测评 plan push pull 拆分
拆分后 plan 大小
E20220907 13:28:30.692723 4087476 nn_graph.cpp:362] rank id 0 plan size 16552127
E20220907 13:28:30.903695 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_1 size 4981886
E20220907 13:28:31.088564 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_2 size 4981854
E20220907 13:28:31.329953 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_3 size 4981966
E20220907 13:28:31.533762 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_4 size 4981922
E20220907 13:28:31.779922 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_5 size 4981902
E20220907 13:28:31.990777 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_6 size 4980294
E20220907 13:28:32.236011 4087476 nn_graph.cpp:391] rank id 0 push plan plan_GraphBase_0_r_7 size 4981786
测评
8卡数据并行T5
base
Graph name: GraphBase_0 Push or Pull plan time elapsed: 1126 milliseconds
sub plan 顺序生成和发送
Graph name: GraphBase_0 Push or Pull plan time elapsed: 1789 milliseconds
sub plan 并行生成和顺序发送
GraphBase_0 Push or Pull plan time elapsed: 994 milliseconds
sub plan 并行生成和并行发送
GraphBase_0 Push or Pull plan time elapsed: 702 milliseconds,降低 38%
I20220908 14:55:18.135390 3711249 time_util.h:81] plan_GraphBase_0_r_3 add op attr time elapsed: 79 milliseconds
I20220908 14:55:18.145119 3711248 time_util.h:81] plan_GraphBase_0_r_2 add op attr time elapsed: 88 milliseconds
I20220908 14:55:18.145119 3711250 time_util.h:81] plan_GraphBase_0_r_4 add op attr time elapsed: 89 milliseconds
I20220908 14:55:18.145119 3711251 time_util.h:81] plan_GraphBase_0_r_5 add op attr time elapsed: 89 milliseconds
I20220908 14:55:18.145119 3711252 time_util.h:81] plan_GraphBase_0_r_6 add op attr time elapsed: 88 milliseconds
I20220908 14:55:18.145119 3711247 time_util.h:81] plan_GraphBase_0_r_1 add op attr time elapsed: 88 milliseconds
I20220908 14:55:18.145119 3711253 time_util.h:81] plan_GraphBase_0_r_7 add op attr time elapsed: 88 milliseconds
I20220908 14:55:18.175093 3711249 time_util.h:81] plan_GraphBase_0_r_3 add task time elapsed: 39 milliseconds
I20220908 14:55:18.185158 3711248 time_util.h:81] plan_GraphBase_0_r_2 add task time elapsed: 40 milliseconds
I20220908 14:55:18.185252 3711247 time_util.h:81] plan_GraphBase_0_r_1 add task time elapsed: 40 milliseconds
I20220908 14:55:18.185537 3711253 time_util.h:81] plan_GraphBase_0_r_7 add task time elapsed: 40 milliseconds
I20220908 14:55:18.185603 3711251 time_util.h:81] plan_GraphBase_0_r_5 add task time elapsed: 40 milliseconds
I20220908 14:55:18.186619 3711252 time_util.h:81] plan_GraphBase_0_r_6 add task time elapsed: 41 milliseconds
I20220908 14:55:18.186646 3711250 time_util.h:81] plan_GraphBase_0_r_4 add task time elapsed: 41 milliseconds
I20220908 14:55:18.249529 3711252 time_util.h:81] plan_GraphBase_0_r_6 PushKV time elapsed: 60 milliseconds
I20220908 14:55:18.250849 3711248 time_util.h:81] plan_GraphBase_0_r_2 PushKV time elapsed: 62 milliseconds
I20220908 14:55:18.262293 3711250 time_util.h:81] plan_GraphBase_0_r_4 PushKV time elapsed: 72 milliseconds
I20220908 14:55:18.270408 3711247 time_util.h:81] plan_GraphBase_0_r_1 PushKV time elapsed: 81 milliseconds
I20220908 14:55:18.373257 3711249 time_util.h:81] plan_GraphBase_0_r_3 PushKV time elapsed: 194 milliseconds
I20220908 14:55:18.421963 3711253 time_util.h:81] plan_GraphBase_0_r_7 PushKV time elapsed: 233 milliseconds
I20220908 14:55:18.423362 3711251 time_util.h:81] plan_GraphBase_0_r_5 PushKV time elapsed: 234 milliseconds
I20220908 14:55:18.423463 3708622 time_util.h:81] Graph name: GraphBase_0 Push plan time elapsed: 372 milliseconds
I20220908 14:55:18.641752 3708622 time_util.h:81] Graph name: GraphBase_0 Pull plan time elapsed: 218 milliseconds
I20220908 14:55:18.645865 3708622 time_util.h:81] Graph name: GraphBase_0 Clear plan time elapsed: 4 milliseconds