Teng-xu
Teng-xu
*Issue #, if available:* *Description of changes:* *Testing done:* ## Merge Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR....
*Issue #, if available:* *Description of changes:* *Testing done:* ## Merge Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR....
Update GPT-J Model training example with Tensor Parallelism using SageMaker Model Parallel Library
*Issue #, if available:* *Description of changes:* Update GPT-J Model training example with Tensor Parallelism using SageMaker Model Parallel Library. Update testing scripts to enable latest features with smp. *Testing...
I was trying to install apex through dockerfile (python3.6 cuda11.1) via the following commands ``` RUN git clone https://github.com/NVIDIA/apex && \ cd apex && \ pip install -v --no-cache-dir --global-option="--cpp_ext"...
### System Info - `transformers` version: 4.37.1 - Platform: Linux-5.10.199-190.747.amzn2.x86_64-x86_64-with-glibc2.31 - Python version: 3.10.8 - Huggingface_hub version: 0.20.2 - Safetensors version: 0.3.3 - Accelerate version: 0.26.1 - Accelerate config: not...
**Describe the bug** During our training sessions utilizing Megatron's Mixture of Experts (MoE) layers, we observed a decline in performance occurring at specific steps, with this deterioration manifesting sporadically and...
**Description:** When training with GroupedMLP and Tensor Parallel (TP) enabled, and `gated_linear_unit` is activated, the activation function is applied to fc1_output. Assuming a TP degree of 2, this intermediate output...