MS-AMP issues

Remove model_state.use_fp8_ddp and optimizer.all_reduce_grads

1

**Description** The argument `model_state.use_fp8_ddp` is deprecated. In MS-AMP examples, all of `model_state.use_fp8_ddp` are set to True. Besides, the function `optimizer.all_reduce_grads` has not been used. **Major Revision** - Remove `model_state.use_fp8_ddp` -...

wkcn

how can i export the model from pytorch to onnx?

1

I have try your example, however when I try to export the model to onnx, it's error。

221588

Support AMD MI300 GPU

**Description** Support AMD MI300 GPU

tocean

Optimizer datatype

3

Hi, I have some question related to the paper: 1) Which FP8 format (E4M3 / E5M2) do you use for the First Adam moment? Do you use Delayed scaling or...

brianchmiel

[#168 fix] add context manager to fake `ScalingTensor`/`ScalingParameter`'s `class` as `torch.Tensor`

**Description** See #168. This is the most non-invasive fix I could come up with. Thanks to @aliencaocao for idea. **Minor Revision** - adds `msamp.common.tensor.tensor.pretend_scaling_is_torch`, which can be used to fix...

152334H

MNIST single GPU example: GradScaler AssertionError

5

**What's the issue, what's expected?**: `python mnist.py --enable-msamp --opt-level=O2` should work with the versions pinned in `pyproject.toml`. Specifically, it should work with `torch==2.2.1`, given that torch is unpinned. **How to...

152334H

Is activation checkpointing used for Table 5 from the FP8-LM paper?

1

Hi, I'm wondering if the TFLOPs/MFU numbers in table 5 of the paper is using activation checkpointing? I've looked through the MS-AMP-Examples repo and it seems like GPT3 megatron does...

SolitaryThinker

V0.4 Release Plan

1

# Release Manager @cp5555 # Endgame - [x] Code freeze: Feb. 9th, 2024 - [x] Bug Bash date: Feb. 12th, 2024 - [x] Release date: Feb. 23rd, 2024 # Main...

cp5555

iteration plan

[Question] How to apply MS-AMP to only part of the model?

2

**What's the issue, what's expected?**: I would like to apply MS-AMP to only parts of the model that are less sensitive to reduced precision. **Additional information**: Some parts of models...

veritas9872

Support for latest Megatron-LM and transformer-engine 1.0 +

2

Thank you for such a great and exciting project ! **What would you like to be added**: Support for latest Megatron-LM and transformer-engine 1.0 + **Why is this needed**: latest...

sosofun

MS-AMP
MS-AMP copied to clipboard

Metadata

Remove model_state.use_fp8_ddp and optimizer.all_reduce_grads

how can i export the model from pytorch to onnx?

Support AMD MI300 GPU

Optimizer datatype

[#168 fix] add context manager to fake `ScalingTensor`/`ScalingParameter`'s `class` as `torch.Tensor`

MNIST single GPU example: GradScaler AssertionError

Is activation checkpointing used for Table 5 from the FP8-LM paper?

V0.4 Release Plan

[Question] How to apply MS-AMP to only part of the model?

Support for latest Megatron-LM and transformer-engine 1.0 +

← Metadata

Owner

Metadata

MS-AMP MS-AMP copied to clipboard

Metadata

← Metadata

Owner

Metadata

MS-AMP
MS-AMP copied to clipboard