stanford_alpaca
stanford_alpaca copied to clipboard
Anyone fine tuned successfully on 1 or 2 GPUs?
I cannot start running the train.py script (on 2 x 4090 gpu) Got this error:
File ".../alp/lib/python3.10/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--LlamaDecoderLayer', 'LLaMADecoderLayer']
Did someone gr thru this hoop? Thanks!
My command was:
torchrun --nproc_per_node=2 (also tried 1) train.py
--model_name_or_path ./output
--data_path ./alpaca_data.json
--bf16 True
--output_dir ./output2
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 2000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--LlamaDecoderLayer 'LLaMADecoderLayer'
--tf32 True
(1) I didn't specify --master_port Is this okay and what is this? Also tried a few random ports not worky
(2) --model_name_or_path ./output output contains config.json pytorch_model-00002-of-00002.bin tokenizer_config.json generation_config.json pytorch_model.bin.index.json tokenizer.model pytorch_model-00001-of-00002.bin special_tokens_map.json
(3) --output_dir ./output2 where output2 is an empty dir
(4) I used --LlamaDecoderLayer 'LLaMADecoderLayer'
I am experiencing the same problem running on 2 x 3090 GPUs.
I get out of memory errors trying on 2 x 3060 8 x A100 amounts to 640GB memory ! My setup is just 22GB. I am wondering how I can fit this giant into the cards.
Hi ashkan-leo, can you share your command? Would like to see our diffs. Out of memory errors is better than the one I got. Thanks!
From: Ashkan @.> Sent: Sunday, April 2, 2023 4:23 PM To: tatsu-lab/stanford_alpaca @.> Cc: Deng, Wei @.>; Author @.> Subject: Re: [tatsu-lab/stanford_alpaca] Anyone fine tuned successfully on 1 or 2 GPUs? (Issue #173)
I get out of memory errors trying on 2 x 3060 8 x A100 amounts to 640GB memory ! My setup is just 22GB. I am wondering how I can fit this giant into the cards.
— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftatsu-lab%2Fstanford_alpaca%2Fissues%2F173%23issuecomment-1493465689&data=05%7C01%7Cwdeng%40wustl.edu%7Ce976845a8556435a878d08db33d141b7%7C4ccca3b571cd4e6d974b4d9beb96c6d6%7C0%7C0%7C638160746077780280%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nCMWjp%2BTlPYUAsznAVY3L21L5kASHViHhKiEkUDRbh4%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFOMVZFBRLPKISEHSWK3USDW7IC6XANCNFSM6AAAAAAWQQ5MEI&data=05%7C01%7Cwdeng%40wustl.edu%7Ce976845a8556435a878d08db33d141b7%7C4ccca3b571cd4e6d974b4d9beb96c6d6%7C0%7C0%7C638160746077780280%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Bq%2Fea8pGciRsCO7uzcVWXHbGPT6Ee4JO6WwaCy2XGMw%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
The mistake I made was using
--LlamaDecoderLayer 'LLaMADecoderLayer'
should use
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
However I did get OutOfMemoryError.
With --nproc_per_node=1
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 774.00 MiB (GPU 0; 23.65 GiB total capacity; 21.86 GiB already allocated; 476.00 MiB free; 21.92 GiB reserved in total by PyTorch)
With --nproc_per_node=2
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 388.00 MiB (GPU 0; 23.65 GiB total capacity; 20.65 GiB already allocated; 353.44 MiB free; 21.71 GiB reserved in total by PyTorch)
More GPUs with more memories should help. Need to try 8 x A100 40G GPUs