Anna Shors

Results 4 issues of Anna Shors

Adds support for NVIDIA's [Transformer Engine](https://github.com/NVIDIA/TransformerEngine). TE can be enabled by setting the environment variable `ENABLE_TE=1`. For more details about running Pax with Transformer Engine, refer to the [JAX Toolbox...

Refactoring to allow gradient clipping to be performed on full batch rather than subbatches when using `ShardedStaticAccumulator`. Note that this refactor allows us to maintain support for `enable_skip_step_on_gradient_anomalies` and requires...

pull ready

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...

stale

# What does this PR do ? This PR provides a fallback codepath that defaults to pure Pytorch/jit when TE and Apex are not installed. This depends on https://github.com/NVIDIA/Megatron-LM/pull/893 **Collection**:...

NLP