NeMo Inference discrepancies after merging weights into a LoRA model

Describe the bug

We noticed that there were some discrepancies in the inference results between the model with loaded LoRA adapter and the model with merged LoRA weights.

Steps/Code to reproduce bug

Fine-tune a LoRA Model with train_gpt_sft.py in NeMo-aligner; (in our case, we used the Mistral-7b as the base model)
Run inference with LoRA adapter loaded on-the-fly using megatron_gpt_generate.py
Merge the LoRA weights using merge.py
Run inference using the merged weight model using the same megatron_gpt_generate script (while skipping loading LoRA ckpt)

During this process, we found:

The inference results from step 2 and step 4 are different by a lot
In the LoRA weight merge script, the validation inference results were different from both step 2 and step 4

(sample of model responses were pasted in Additional context session)

Expected behavior

The inference results from merged weight model and model with LoRA adapter should be the same with the same inference config:

inference:
  greedy: True
  add_BOS: True
  tokens_to_generate: 1024
  all_probs: False
  repetition_penalty: 1.2
  min_tokens_to_generate: 0
  compute_logprob: False
  end_strings: ["<|endoftext|>"]

Environment overview (please complete the following information)

Environment location: Docker
Method of NeMo install: install from source

Environment details

If NVIDIA docker image is used you don't need to specify these.

Nvidia PyTorch Docker: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-01.html
NeMo version: https://github.com/NVIDIA/NeMo/commit/9b64e390b534d4eb5ad7f28502bcfe4c7f0c6c39
NeMo-Aligner versino: https://github.com/NVIDIA/NeMo-Aligner/commit/ea78731d9fd86e822b0253fca8a10e0e8a4526c9

Additional context

Samples of the inference results

Inference result from Step 2 (Base Mistral + LoRA Adapter loaded on the fly)

# Prompt: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

Include photos and thoughtful captions
that convey the beauty and aloha spirit of the islands.\n\nAloha friends,\n\nI recently had the pleasure of visiting the beautiful islands of Hawaii for the first time. As a travel blogger, I was eager to experience the rich history and culture of this trop
ical paradise. My trip was filled with once-in-a-lifetime experiences

...

Inference result from Step 4 (LoRA model with merged weights)

# Prompt: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

Here is a draft with key points to share a fun-filled vacation:\n\nAloha!
I recently returned from a fun-filled trip to the big island of Hawaii.
I was so blessed to have a family member visit and Hawaiian cultural immersion
while there.

...

Inference result in Step 3 in the merge.py

# Prompt: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

Here is a draft:\n\nAloha! I recently returned from a dream vacation in the beautiful islands of Hawaii.
I had such a rejuvenating and eye-opening experience there. This laid-back and peaceful paradise really is a special place.\n\nI visited both Big Island and Oahu, and it was such a treat to see the lava flows and waterfalls, and the lush green volcanic crater on Kilauea

...

Apr 16 '24 19:04 mark-myzhao

@mark-myzhao Thanks we will dig further. In the mean time can you try training lora weights using https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py

Apr 25 '24 22:04 arendu

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

May 27 '24 01:05 github-actions[bot]

I met the same issue. and also I found that merge the lora on 3090 is different from A100, by different I mean the two merged models have different MD5.

May 30 '24 12:05 EarthXP

Can I ask a naive question, what is the temperature are you using to generate the ansewers? I would like understand if it would always generate the same response or you are simulation creativity.

Jun 21 '24 03:06 carlosclaiton

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jul 22 '24 01:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 29 '24 01:07 github-actions[bot]

Hi @mark-myzhao I'm facing the same issue, did you find a solution?

Aug 02 '24 13:08 TomekPro

NeMo NeMo copied to clipboard

Inference discrepancies after merging weights into a LoRA model

NeMo
NeMo copied to clipboard