MS-AMP [#168 fix] add context manager to fake `ScalingTensor`/`ScalingParameter`'s `__class_

[#168 fix] add context manager to fake `ScalingTensor`/`ScalingParameter`'s `class` as `torch.Tensor`

Open 152334H opened this issue 11 months ago • 0 comments

Description See #168. This is the most non-invasive fix I could come up with. Thanks to @aliencaocao for idea.

Minor Revision

adds msamp.common.tensor.tensor.pretend_scaling_is_torch, which can be used to fix GradScaler().step().

This is a non-breaking change as it does not deviate from prior behaviour without explicitly calling with pretend_scaling_is_torch().

Mar 06 '24 05:03 152334H