accelerate
accelerate copied to clipboard
[Docs] Update low-precision training docs for MS-AMP
Hi team, I wanted to suggest updating the low-precision training docs, esp related to MS-AMP, which seems to no longer be maintained. When I was trying to get this working I ran into a few issues:
- MS-AMP requires a container with it pre-installed. While they provide base images, they are built on much older CUDA versions (11.8/12.1), so using newer CUDA versions requires building from MS-AMP from source
- MS-AMP requires MSCCL for multi-gpu communication with FP8, which is a fork of NCCL, but it uses a much older version of NCCL and hasn't been updated in 2-3 years. This leads to symbol conflicts when using more recent versions of pytorch built against newer NCCL
Are these valid? If so, I would recommend flagging MS-AMP as deprecated and directing users to TE or other approaches.
cc @muellerzr if you have any insights as you added this. But probably a good step is to advise users to try TE or torchao to perform FP8. Indeed, it looks like ms-amp is not maintained anymore which can make it hard to work with as you experienced.
Yeah last I checked I had wanted to deprecate it/remove MS-AMP since it's no longer maintained. if you want to get on that before I'm back @SunMarc feel free :D (~6 versions/mo out I think is fine since it breaks for users anyways)
Perfect, thanks for the confirmation ! I'll do that soon
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @SunMarc I would like to update the doc for this. I will leave the details on MS-AMP for historical context but highlight the challenges, recommend TE and remove the language on using both in combo. Please assign the ticket to me and let me know if there are any code changes needed related to this. Thanks!