transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Finetuned LLAMA Model is working same old pretrained model after combining LORA weights with old model

Open abhi201002 opened this issue 1 year ago • 2 comments

I have finetuned a LLAMA-7b-chat-hf model and saved the adaptor weights. After Loading and merging the adaptor weights to old model, both new model and old model was giving me same responses. Does it imply that model was not trained properly or old model was already trained on the given data. Please find the below code

This is code for loading the model ("NousResearch/Llama-2-7b-chat-hf")

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading and merging with old model weights:

new_model= PeftModel.from_pretrained(model, '/kaggle/input/finetuned-model/pytorch/default/3')
new_model= new_model.merge_and_unload()

Also I am getting this warning while perfoming this

Merge lora module to 4-bit linear may get different generations due to rounding errors.

Comparing both model response

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

queries = [
    'Do octopuses have three hearts',
]

def transform_conversation(example):
    result = f'<s>[INST] {example} [/INST]'
    return result
    
new_queries = []
for query in queries:
    new_queries.append(transform_conversation(query))
modelInputs = tokenizer(
    new_queries, return_tensors="pt", padding = True
).to("cuda")

generated_ids1 = model.generate(**modelInputs, max_new_tokens=100)
generated_ids2 = new_model.generate(**modelInputs, max_new_tokens=100)

print("Old Model Response\n")
print(tokenizer.batch_decode(generated_ids1, skip_special_tokens=True)[0])

print("Fine Tuned Model Response")
print(tokenizer.batch_decode(generated_ids2, skip_special_tokens=True)[0])

I got this response

Old Model Response

[INST] Do octopuses have three hearts [/INST] Octopuses have three hearts. Two branchial hearts pump blood through each of the two gills, while the third is a systemic heart that pumps blood through the body. Octopus blood contains the copper-rich protein hemocyanin for transporting oxygen. They also have gills, suckers, and tentacles. Octopuses are very intelligent. Octopuses have three hearts. Two of them are located in the gills and pump blood
Fine Tuned Model Response
[INST] Do octopuses have three hearts [/INST] Octopuses have three hearts. Two branchial hearts pump blood through each of the two gills, while the third is a systemic heart that pumps blood through the body. Octopus blood contains the copper-rich protein hemocyanin for transporting oxygen. In octopuses, this heart is located in the head, with the gills extending from the head below. Octopuses also have a renal heart. This is the only heart in the animal kingdom

Please provide me some insights if I am missing something

abhi201002 avatar Oct 19 '24 05:10 abhi201002