stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

Wonder how to inference after finetuning.

Open 5taku opened this issue 11 months ago • 0 comments

Hi I finetuned the llama 7b model using alpaca.

Below is the command I ran.

CUDA_VISIBLE_DEVICES=2 torchrun --nproc_per_node=1 --master_port=8090 train.py \
    --model_name_or_path ./model/weight/7B \
    --data_path /home/sulki/project/devops/my_own_data.json \
    --bf16 True \
    --output_dir ./output_my_own_data/7B \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "./configs/default_offload_opt_param.json" \
    --tf32 True	> my_own_data.log	

Time passes and finetuning is complete. Below is the final part of the log.

[2023-07-20 18:05:19,608] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-20 18:05:19,608] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-20 18:05:29,973] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 6.74B parameters
ninja: no work to do.
Time to load cpu_adam op: 2.9780356884002686 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
{'loss': 0.93, 'learning_rate': 1.9131861575179e-05, 'epoch': 0.22}
{'loss': 0.8989, 'learning_rate': 1.7640214797136038e-05, 'epoch': 0.43}
[2023-07-21 10:45:40,546] [WARNING] [stage3.py:1850:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.8909, 'learning_rate': 1.614856801909308e-05, 'epoch': 0.65}
{'loss': 0.8855, 'learning_rate': 1.4656921241050121e-05, 'epoch': 0.87}
{'loss': 0.8753, 'learning_rate': 1.316527446300716e-05, 'epoch': 1.09}
{'loss': 0.8607, 'learning_rate': 1.1673627684964201e-05, 'epoch': 1.3}
{'loss': 0.8557, 'learning_rate': 1.0181980906921243e-05, 'epoch': 1.52}
{'loss': 0.8521, 'learning_rate': 8.690334128878282e-06, 'epoch': 1.74}
{'loss': 0.8464, 'learning_rate': 7.198687350835323e-06, 'epoch': 1.95}
{'loss': 0.7611, 'learning_rate': 5.707040572792363e-06, 'epoch': 2.17}
{'loss': 0.7234, 'learning_rate': 4.2153937947494036e-06, 'epoch': 2.39}
{'loss': 0.7142, 'learning_rate': 2.723747016706444e-06, 'epoch': 2.6}
{'loss': 0.7086, 'learning_rate': 1.2321002386634846e-06, 'epoch': 2.82}
{'train_runtime': 403223.9352, 'train_samples_per_second': 2.194, 'train_steps_per_second': 0.017, 'train_loss': 0.8234077206364384, 'epoch': 3.0}

Below is the generated output_dir .

├── added_tokens.json
├── checkpoint-6000
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── global_step6000
│   │   ├── bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
│   │   └── zero_pp_rank_0_mp_rank_00_model_states.pt
│   ├── latest
│   ├── rng_state.pth
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   ├── trainer_state.json
│   ├── training_args.bin
│   └── zero_to_fp32.py
├── config.json
├── generation_config.json
├── global_step6912
│   ├── bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
│   └── zero_pp_rank_0_mp_rank_00_model_states.pt
├── latest
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.model
├── trainer_state.json
├── training_args.bin
└── zero_to_fp32.py

Here's how I've tried:

  1. run zero_to_fp32.py
ptyhon zero_to_fp32.py 

I created an output file name as result.bin and this file was created. The model path is set to output_my_own_data/7B/checkpoint-6000 and the output_path is set to output_my_own_data/7B/result.bin.

.
├── added_tokens.json
├── checkpoint-6000
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── global_step6000
│   │   ├── bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
│   │   └── zero_pp_rank_0_mp_rank_00_model_states.pt
│   ├── latest
│   ├── rng_state.pth
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   ├── trainer_state.json
│   ├── training_args.bin
│   └── zero_to_fp32.py
├── config.json
├── generation_config.json
├── global_step6912
│   ├── bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
│   └── zero_pp_rank_0_mp_rank_00_model_states.pt
├── latest
~~├── result.bin~~
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.model
├── trainer_state.json
├── training_args.bin
└── zero_to_fp32.py

I loaded the model as below.

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("./output_my_own_data/7B/tokenizer.model")
model = LlamaForCausalLM.from_pretrained("./output_my_own_data/7B/result.bin")

print()

The tokenizer is loaded, but the model has an error.

Exception has occurred: OSError
It looks like the config file at '/home/sulki/project/stanford_alpaca-main/output_tawos_34/7B/result.bin' is not a valid JSON file.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

  File "/home/sulki/project/stanford_alpaca-main/inference copy.py", line 4, in <module>
    model = LlamaForCausalLM.from_pretrained("/home/sulki/project/stanford_alpaca-main/output_tawos_34/7B/result.bin")
OSError: It looks like the config file at '/home/sulki/project/stanford_alpaca-main/output_tawos_34/7B/result.bin' is not a valid JSON file.

Which part is the problem?

5taku avatar Jul 26 '23 09:07 5taku