stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

Train alpaca with a small set of official data set. But going into messy

Open ZeyuTeng96 opened this issue 1 year ago • 1 comments

Hi, can anyone help me about it. I tried to use a small set of official dataset for training. However, the train loss is quite high. And the responses which generated by the trained model are messy.

The train shell script is: CUDA_VISIBLE_DEVICES=1 nohup python -u train.py
--model_name_or_path "decapoda-research/llama-7b-hf"
--data_path ./alpaca_test.json
--output_dir ./res
--num_train_epochs 10
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 5
--save_total_limit 5
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--deepspeed ./ds_config.json > train.log 2>&1 &

Also, I modified the official train.py script in two lines: line 195 -> to: model = transformers.LlamaForCausalLM.from_pretrained( line 200 -> to: tokenizer = transformers.LlamaTokenizer.from_pretrained(

The train log is weird: {'loss': 20.1413, 'learning_rate': 2.1085949060360654e-06, 'epoch': 0.46} {'loss': 21.5844, 'learning_rate': 1.4016954246529697e-05, 'epoch': 0.91} this are the first two training steps' loss. If I did not apply gradient accumulation, then the training loss will be normal (around 1.5 - 2)

The inference script: from transformers import LlamaForCausalLM, LlamaTokenizer import torch

tokenizer = LlamaTokenizer.from_pretrained( "./checkpoint-90", add_eos_token=True )

model = LlamaForCausalLM.from_pretrained( "./checkpoint-90" )

model = model.to("cuda:1")

def generate_prompt(instruction, input=None): if input: return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:""" else: return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"""

prompt = generate_prompt("What are the three primary colors?")

inputs = tokenizer(prompt, return_tensors="pt") input_ids = inputs["input_ids"].to("cuda:1")

with torch.no_grad(): generation_output = model.generate( input_ids=input_ids, temperature=0.7, top_p=0.9, do_sample=True, num_beams=1, max_new_tokens=600, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, return_dict_in_generate=True, output_scores=True )

s = generation_output.sequences[0] output = tokenizer.decode(s)

res = output.split("### Response:")[1].strip()

for the testing instruction is: What are the three primary colors?

the output is: Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response: # Fokker S.11\n\nThe Fokker S.11 was a small, three-seater biplane flown by the German Luftstreitkräfte (air force) during the First World War. It was an unarmed reconnaissance aircraft that was used in large numbers, but was inferior to the Allied aircraft of the period. It was a development of the Fokker M.5K, a smaller version of the Fokker M.5. The S.11 was introduced in 1917 and was the last of the Fokker biplanes to be produced. It was a successful design, with a total of 2,100 examples built. The S.11 was also used by the United States Army Air Service, which was the forerunner of the United States Air Force.

ZeyuTeng96 avatar Mar 21 '23 13:03 ZeyuTeng96