alpa icon indicating copy to clipboard operation
alpa copied to clipboard

[FEATURE] Reduce the required peak RAM on a single node while converting weights

Open zhanyuanucb opened this issue 1 year ago • 3 comments

System information

  • Alpa version: v0.2.2
  • Are you willing to contribute to it (Yes/No): Yes, but not immediately

Describe the new feature and the current behavior/state Referring to here, for now, weight conversion for OPT-175B requires a peak RAM usage as large as twice of the model size. It will be great to do this in a distributed way to reduce the required peak RAM on a single node.

Will this change the current API? How? Changes will mostly happen in the step_2_consolidate_992_shards_to_singleton.py

Describe alternatives you've considered

Additional context

zhanyuanucb avatar Dec 01 '22 17:12 zhanyuanucb