Minghao Yan issues

Repositories
Issues
Comments

Results 4 issues of


                                            Minghao Yan

Compatibility issue with CUDA 12.2

### Branch/Tag/Commit main ### Docker Image Version N/A ### GPU name A100 ### CUDA Driver 535.54.03 ### Reproduced Steps ```shell Install CUDA 12.2 and newest driver, make -j12 would exit...

bug

In place operations in backward pass cause errors in asynchronous concurrent training in FSDP

Hi, I am trying to modify the LoRA training recipe with FSDP to support conditional training. So I have two LoRA adapters in a module and use a binary variable...

Add load from HF ckpts to FSDP model fails.

This is a minimal reproducible example for issues discussed in #421

CLA Signed

LoRA fine-tuning weights explosion in FSDP training

Dear authors, I encountered weights explosion problems during integrating LoRA to torchtitan. I am running with train_configs/llama3_8b.toml configs with run_llama_train.sh on 4 A10 24GB GPUs. PyTorch version is the latest...