Saaketh Narayan
Saaketh Narayan
### Describe the bug I'm using CUDA-aware OpenMPI that uses UCX (from one of NVIDIA's [PyTorch images](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-10.html), which has UCX installed as part of HPC-X) to perform collectives between GPUs....
# What does this PR do? This decouples max_duration from t_max in LR schedulers. # What issue(s) does this change relate to? # Before submitting - [ ] Have you...
Some custom FC layers will need custom kwargs. This PR enables that by changing `fc_type` from `str` to `Union[str, Dict]`, and converting it to dict thereafter. Default configs have also...
self descriptive
Hey, I'm using the `te_gemm` function defined in the PyTorch extensions [here](https://github.com/cli99/TransformerEngine/blob/6b21f606f2459d49c2113d69236d68d334edeb4c/transformer_engine/pytorch/csrc/extensions/gemm.cu#L10), and I'm trying to apply a scaling factor to the output. My gemm inputs are in fp8e4m3 and...
# What does this PR do? Marked as draft since it depends on #3434 Disables tensor parallelism when the `tensor_parallelism_degree` is 1. This should be a no-op and any TP...
# What does this PR do? This fixes a bug where if the TP configuration (specified through `parallelism_config['tp']`) was passed in as a dict, it would not be correctly processed...
# What does this PR do? Resubmission of https://github.com/mosaicml/composer/pull/3394 -- using FA's CE Loss results in lower peak reserved memory usage and higher throughput. We are not adding flash attention...
# What does this PR do? Bumps peft to min 0.12.0, since it contains a crucial memory-saving fix when saving adapters. # What issue(s) does this change relate to? #...