transformers
transformers copied to clipboard
Finetuned LLAMA Model is working same old pretrained model after combining LORA weights with old model
I have finetuned a LLAMA-7b-chat-hf model and saved the adaptor weights. After Loading and merging the adaptor weights to old model, both new model and old model was giving me same responses. Does it imply that model was not trained properly or old model was already trained on the given data. Please find the below code
This is code for loading the model ("NousResearch/Llama-2-7b-chat-hf")
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)
# Load base model
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
Loading and merging with old model weights:
new_model= PeftModel.from_pretrained(model, '/kaggle/input/finetuned-model/pytorch/default/3')
new_model= new_model.merge_and_unload()
Also I am getting this warning while perfoming this
Merge lora module to 4-bit linear may get different generations due to rounding errors.
Comparing both model response
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training
queries = [
'Do octopuses have three hearts',
]
def transform_conversation(example):
result = f'<s>[INST] {example} [/INST]'
return result
new_queries = []
for query in queries:
new_queries.append(transform_conversation(query))
modelInputs = tokenizer(
new_queries, return_tensors="pt", padding = True
).to("cuda")
generated_ids1 = model.generate(**modelInputs, max_new_tokens=100)
generated_ids2 = new_model.generate(**modelInputs, max_new_tokens=100)
print("Old Model Response\n")
print(tokenizer.batch_decode(generated_ids1, skip_special_tokens=True)[0])
print("Fine Tuned Model Response")
print(tokenizer.batch_decode(generated_ids2, skip_special_tokens=True)[0])
I got this response
Old Model Response
[INST] Do octopuses have three hearts [/INST] Octopuses have three hearts. Two branchial hearts pump blood through each of the two gills, while the third is a systemic heart that pumps blood through the body. Octopus blood contains the copper-rich protein hemocyanin for transporting oxygen. They also have gills, suckers, and tentacles. Octopuses are very intelligent. Octopuses have three hearts. Two of them are located in the gills and pump blood
Fine Tuned Model Response
[INST] Do octopuses have three hearts [/INST] Octopuses have three hearts. Two branchial hearts pump blood through each of the two gills, while the third is a systemic heart that pumps blood through the body. Octopus blood contains the copper-rich protein hemocyanin for transporting oxygen. In octopuses, this heart is located in the head, with the gills extending from the head below. Octopuses also have a renal heart. This is the only heart in the animal kingdom
Please provide me some insights if I am missing something