Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

更新colossalai的版本,并且修改源码中对新版本的bug,可以获得更好的性能

Open lyx3911 opened this issue 1 year ago • 1 comments

更新colossalai的版本为0.4.2,把opensora/utils/train_utils.py中line35的 master_param = optimizer._param_store.working_to_master_param[param_id] 修改为master_param = optimizer.working_to_master_param[param_id] 可以获得更好的性能,收益约10%, 因为新版本的colossalai对optimizer.step()的通信做了聚合,减少了通信的次数。

lyx3911 avatar Aug 05 '24 11:08 lyx3911

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Aug 13 '24 01:08 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 21 '24 01:08 github-actions[bot]

@lyx3911 我这边单机测试几乎没有提升,能说下具体什么场景下有提速吗?是否需要 overlap_allgather 参数设置成 True?谢谢

flymin avatar Aug 28 '24 08:08 flymin

我这边是多卡训练,stage1的时候提升挺明显的,stage2、3不太明显。主要是通信占比高的时候收益明显吧

lyx3911 avatar Sep 15 '24 03:09 lyx3911