stanford_alpaca
stanford_alpaca copied to clipboard
Update requirements.txt
I got this error
python3.10/site-packages/transformers-4.27.0.dev0-py3.10.egg/transformers/trainer.py", line 1460, in _wrap_model
self.model = model = FSDP(
TypeError: FullyShardedDataParallel.__init__() got an unexpected keyword argument 'forward_prefetch'
torch1.12 does not support forward_prefetch .
So I switch torch to 1.13.1, It works !
HI, @tpoisonooo
Are you able to run fine-tuning successfully?
HI, @tpoisonooo
Are you able to run fine-tuning successfully?
Yes here is my version table and run.sh on A100
| repo | commit-id |
|---|---|
| stanford_alpaca | eb5b171d9b103a12a8e14e0edca9cbc45fe1d512 |
| transformers | 68d640f7c368bcaaaecfc678f11908ebbd3d6176 |
| torch | 1.13.1 |
$ cat run.sh
CUDA_VISIBLE_DEVICES=7,6,5,4,2,1,3,0 torchrun --nproc_per_node=4 --master_port=9999 train.py \
--model_name_or_path ${PATH_TO_LLAMA} \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir ${OUTPUT_PATH} \
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
--tf32 True
And you can also try alpaca_lora + transformers master branch for saving memory.