更新colossalai的版本,并且修改源码中对新版本的bug,可以获得更好的性能
更新colossalai的版本为0.4.2,把opensora/utils/train_utils.py中line35的 master_param = optimizer._param_store.working_to_master_param[param_id] 修改为master_param = optimizer.working_to_master_param[param_id] 可以获得更好的性能,收益约10%, 因为新版本的colossalai对optimizer.step()的通信做了聚合,减少了通信的次数。
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
@lyx3911 我这边单机测试几乎没有提升,能说下具体什么场景下有提速吗?是否需要 overlap_allgather 参数设置成 True?谢谢
我这边是多卡训练,stage1的时候提升挺明显的,stage2、3不太明显。主要是通信占比高的时候收益明显吧