Victor Zhu

Results 7 issues of Victor Zhu

*Description of changes:* Add GPT2 model train script with HuggingFace Trainer and SageMaker Model Parallel ## Merge Checklist - [x] I have read the [CONTRIBUTING](https://github.com/aws/amazon-sagemaker-examples/blob/master/CONTRIBUTING.md) doc and adhered to the...

# What does this PR do? This PR adds support for SageMaker Sharded Data Parallel with SMP version >= 1.15. We mainly follow Deepspeed's checkpointing logic in our integration. When...

… with Sharded Data Parallelism through a custom SMP Trainer. This example shows you how to use SMP Trainer as a drop-in replacement for HuggingFace Trainer to enable Sharded Data...

Hi, I'm seeing higher losses using `te.Linear` over `nn.Linear` directly in transformer models such as Llama which I assume is expected due to the nature of FP8. However, I don't...

*Issue #, if available:* *Description of changes:* Update conda environment setup to install latest PT2.3.1 TSM2.4.0 conda package and relevant dependencies. By submitting this pull request, I confirm that you...

*Issue #, if available:* *Description of changes:* *Testing done:* ## Merge Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR....

**Describe the bug** When enabling FP8 mixed precision during training of a Mixtral model (`SequentialMLP` expert layer), we are observing that training and validation loss differs more than expected. **To...