victorzhz111

Results 2 issues of victorzhz111

When I finetuning the alpaca-alora model, I applied the alora modules on attention layers" {q_proj, v_proj}", and received the evaluation loss as NaN. However, if I applied the alora moduyles...

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \ --deepspeed /mnt/workspace/hanzhong/LLaMA-Factory/train_sh/ds_config_zero3.json \ --stage dpo \ --do_train \...