Is there any example with recent version of Megatron-LM?
Open
cryoco
opened this issue 2 years ago
•
1 comments
The examples showed here or here is based on versions about half a year ago. Is there any examples aligned with recent Megatron? Or, is there still relatively obvious optimization with deepspeed to Megatron with pipeline parallelization now?