MS-AMP
MS-AMP copied to clipboard
Microsoft Automatic Mixed Precision Library
**Description** The argument `model_state.use_fp8_ddp` is deprecated. In MS-AMP examples, all of `model_state.use_fp8_ddp` are set to True. Besides, the function `optimizer.all_reduce_grads` has not been used. **Major Revision** - Remove `model_state.use_fp8_ddp` -...
I have try your example, however when I try to export the model to onnx, it's error。
**Description** Support AMD MI300 GPU
Hi, I have some question related to the paper: 1) Which FP8 format (E4M3 / E5M2) do you use for the First Adam moment? Do you use Delayed scaling or...
**Description** See #168. This is the most non-invasive fix I could come up with. Thanks to @aliencaocao for idea. **Minor Revision** - adds `msamp.common.tensor.tensor.pretend_scaling_is_torch`, which can be used to fix...
**What's the issue, what's expected?**: `python mnist.py --enable-msamp --opt-level=O2` should work with the versions pinned in `pyproject.toml`. Specifically, it should work with `torch==2.2.1`, given that torch is unpinned. **How to...
Hi, I'm wondering if the TFLOPs/MFU numbers in table 5 of the paper is using activation checkpointing? I've looked through the MS-AMP-Examples repo and it seems like GPT3 megatron does...
# Release Manager @cp5555 # Endgame - [x] Code freeze: Feb. 9th, 2024 - [x] Bug Bash date: Feb. 12th, 2024 - [x] Release date: Feb. 23rd, 2024 # Main...
**What's the issue, what's expected?**: I would like to apply MS-AMP to only parts of the model that are less sensitive to reduced precision. **Additional information**: Some parts of models...
Thank you for such a great and exciting project ! **What would you like to be added**: Support for latest Megatron-LM and transformer-engine 1.0 + **Why is this needed**: latest...