Teng-xu issues

Results 7 issues of


                                            Teng-xu

Add GPT-2 Custom Model training example with Tensor Parallelism using SageMaker Model Parallel Library

*Issue #, if available:* *Description of changes:* *Testing done:* ## Merge Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR....

Add GPT-Neo Model training example with Tensor Parallelism using SageMaker Model Parallel Library

*Issue #, if available:* *Description of changes:* *Testing done:* ## Merge Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR....

Update GPT-J Model training example with Tensor Parallelism using SageMaker Model Parallel Library

*Issue #, if available:* *Description of changes:* Update GPT-J Model training example with Tensor Parallelism using SageMaker Model Parallel Library. Update testing scripts to enable latest features with smp. *Testing...

Apex installation failed

I was trying to install apex through dockerfile (python3.6 cuda11.1) via the following commands ``` RUN git clone https://github.com/NVIDIA/apex && \ cd apex && \ pip install -v --no-cache-dir --global-option="--cpp_ext"...

Llama 2 model divergence with FSDP

### System Info - `transformers` version: 4.37.1 - Platform: Linux-5.10.199-190.747.amzn2.x86_64-x86_64-with-glibc2.31 - Python version: 3.10.8 - Huggingface_hub version: 0.20.2 - Safetensors version: 0.3.3 - Accelerate version: 0.26.1 - Accelerate config: not...

[BUG] Permormance drop while training with MoE

**Describe the bug** During our training sessions utilizing Megatron's Mixture of Experts (MoE) layers, we observed a decline in performance occurring at specific steps, with this deterioration manifesting sporadically and...

stale

[QUESTION] glu activation with tensor parallel in GroupedMLP

**Description:** When training with GroupedMLP and Tensor Parallel (TP) enabled, and `gated_linear_unit` is activated, the activation function is applied to fc1_output. Assuming a TP degree of 2, this intermediate output...