LLaVA
LLaVA copied to clipboard
[Question] Training with Qwen2 backend got loss 0
Question
I got loss to be 0 when training on Qwen2 backend,
{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0}
0%|▎ | 20/8720 [01:38<11:01:39, 4.56s/it]WARNING: tokenization mismatch: 47 vs. 48. (ignored)
WARNING: tokenization mismatch: 54 vs. 55. (ignored)
WARNING: tokenization mismatch: 46 vs. 47. (ignored)
WARNING: tokenization mismatch: 43 vs. 44. (ignored)
What could be the reason caused it?
me too
I found that the reason for this problem is different tokenizer rules.
The bos_token
is null and the eos_token
is set to "<|endoftext|>" in the Qwen tokenizer configuration.
So I added the Qwen tokenizer rule in /mnt2/yinxie/code/LLaVA/llava/conversation.py
as follows:
class SeparatorStyle(Enum):
"""Different separator style."""
SINGLE = auto()
TWO = auto()
MPT = auto()
PLAIN = auto()
LLAMA_2 = auto()
QWEN_2 = auto()
def get_prompt(self):
elif self.sep_style == SeparatorStyle.QWEN_2:
seps = [self.sep, self.sep2]
ret = self.system + seps[0]
for i, (role, message) in enumerate(messages):
if message:
if type(message) is tuple:
message, _, _ = message
ret += role + ": " + message + seps[i % 2]
else:
ret += role + ":"
conv_qwen_2 = Conversation(
system="A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions.",
roles=("USER", "ASSISTANT"),
version="qwen_v2",
messages=(),
offset=0,
sep_style=SeparatorStyle.QWEN_2,
sep=" ",
sep2="<|endoftext|>",
)
conv_templates = {
"default": conv_vicuna_v0,
"v0": conv_vicuna_v0,
"v1": conv_vicuna_v1,
"vicuna_v1": conv_vicuna_v1,
"qwen_2": conv_qwen_2,
"llama_2": conv_llama_2,
"mistral_instruct": conv_mistral_instruct,
"chatml_direct": conv_chatml_direct,
"mistral_direct": conv_chatml_direct,
"plain": conv_llava_plain,
"v0_plain": conv_llava_plain,
"llava_v0": conv_llava_v0,
"v0_mmtag": conv_llava_v0_mmtag,
"llava_v1": conv_llava_v1,
"v1_mmtag": conv_llava_v1_mmtag,
"llava_llama_2": conv_llava_llama_2,
"mpt": conv_mpt,
}
And then, I added the method preprocess_qwen_2
in train.py
.
def preprocess_qwen_2(
sources,
tokenizer: transformers.PreTrainedTokenizer,
has_image: bool = False
) -> Dict:
conv = conversation_lib.default_conversation.copy()
roles = {"human": conv.roles[0], "gpt": conv.roles[1]}
# Apply prompt templates
conversations = []
for i, source in enumerate(sources):
if roles[source[0]["from"]] != conv.roles[0]:
# Skip the first one if it is not from human
source = source[1:]
conv.messages = []
for j, sentence in enumerate(source):
role = roles[sentence["from"]]
assert role == conv.roles[j % 2], f"{i}"
conv.append_message(role, sentence["value"])
conversations.append(conv.get_prompt())
# Tokenize conversations
if has_image:
input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0)
else:
input_ids = tokenizer(
conversations,
return_tensors="pt",
padding="longest",
max_length=tokenizer.model_max_length,
truncation=True,
).input_ids
targets = input_ids.clone()
assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2
# Mask targets
sep = conv.sep + conv.roles[1] + ": "
for conversation, target in zip(conversations, targets):
total_len = int(target.ne(tokenizer.pad_token_id).sum())
rounds = conversation.split(conv.sep2)
rounds_len = len(rounds)
cur_len = 0
# target[:cur_len] = IGNORE_INDEX
for i, rou in enumerate(rounds):
if rou == "":
break
parts = rou.split(sep)
if len(parts) != 2:
break
parts[0] += sep
if has_image:
round_ids = tokenizer_image_token(rou, tokenizer)
instruction_ids = tokenizer_image_token(parts[0], tokenizer)
equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]
instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
round_len = len(round_ids)
else:
round_ids = tokenizer(rou).input_ids
instruction_ids = tokenizer(parts[0]).input_ids
equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)]
instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts)
round_len = len(round_ids)
if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
round_len += 1
instruction_len += 1
target[cur_len : cur_len + instruction_len] = IGNORE_INDEX
cur_len += round_len
target[cur_len:] = IGNORE_INDEX
if cur_len < tokenizer.model_max_length:
if cur_len != total_len + rounds_len - 2:
target[:] = IGNORE_INDEX
print(
f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}."
f" (ignored)"
)
return dict(
input_ids=input_ids,
labels=targets,
)
def preprocess(
sources: Sequence[str],
tokenizer: transformers.PreTrainedTokenizer,
has_image: bool = False
) -> Dict:
if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN:
return preprocess_plain(sources, tokenizer)
if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2:
return preprocess_llama_2(sources, tokenizer, has_image=has_image)
if conversation_lib.default_conversation.version.startswith("v1"):
return preprocess_v1(sources, tokenizer, has_image=has_image)
if conversation_lib.default_conversation.version == "mpt":
return preprocess_mpt(sources, tokenizer, has_image=has_image)
if conversation_lib.default_conversation.version.startswith("qwen_v2"):
return preprocess_qwen_2(sources, tokenizer, has_image=has_image)
After these operations, the mismatch warning disappeared.
However, I must mention that I don't have GPUs for training now, so there may be other problems.
Hope this helps you.
I found that the reason for this problem is different tokenizer rules. The
bos_token
is null and theeos_token
is set to "<|endoftext|>" in the Qwen tokenizer configuration. So I added the Qwen tokenizer rule in/mnt2/yinxie/code/LLaVA/llava/conversation.py
as follows:class SeparatorStyle(Enum): """Different separator style.""" SINGLE = auto() TWO = auto() MPT = auto() PLAIN = auto() LLAMA_2 = auto() QWEN_2 = auto() def get_prompt(self): elif self.sep_style == SeparatorStyle.QWEN_2: seps = [self.sep, self.sep2] ret = self.system + seps[0] for i, (role, message) in enumerate(messages): if message: if type(message) is tuple: message, _, _ = message ret += role + ": " + message + seps[i % 2] else: ret += role + ":" conv_qwen_2 = Conversation( system="A chat between a curious user and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the user's questions.", roles=("USER", "ASSISTANT"), version="qwen_v2", messages=(), offset=0, sep_style=SeparatorStyle.QWEN_2, sep=" ", sep2="<|endoftext|>", ) conv_templates = { "default": conv_vicuna_v0, "v0": conv_vicuna_v0, "v1": conv_vicuna_v1, "vicuna_v1": conv_vicuna_v1, "qwen_2": conv_qwen_2, "llama_2": conv_llama_2, "mistral_instruct": conv_mistral_instruct, "chatml_direct": conv_chatml_direct, "mistral_direct": conv_chatml_direct, "plain": conv_llava_plain, "v0_plain": conv_llava_plain, "llava_v0": conv_llava_v0, "v0_mmtag": conv_llava_v0_mmtag, "llava_v1": conv_llava_v1, "v1_mmtag": conv_llava_v1_mmtag, "llava_llama_2": conv_llava_llama_2, "mpt": conv_mpt, }
And then, I added the method
preprocess_qwen_2
intrain.py
.def preprocess_qwen_2( sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: conv = conversation_lib.default_conversation.copy() roles = {"human": conv.roles[0], "gpt": conv.roles[1]} # Apply prompt templates conversations = [] for i, source in enumerate(sources): if roles[source[0]["from"]] != conv.roles[0]: # Skip the first one if it is not from human source = source[1:] conv.messages = [] for j, sentence in enumerate(source): role = roles[sentence["from"]] assert role == conv.roles[j % 2], f"{i}" conv.append_message(role, sentence["value"]) conversations.append(conv.get_prompt()) # Tokenize conversations if has_image: input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0) else: input_ids = tokenizer( conversations, return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True, ).input_ids targets = input_ids.clone() assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2 # Mask targets sep = conv.sep + conv.roles[1] + ": " for conversation, target in zip(conversations, targets): total_len = int(target.ne(tokenizer.pad_token_id).sum()) rounds = conversation.split(conv.sep2) rounds_len = len(rounds) cur_len = 0 # target[:cur_len] = IGNORE_INDEX for i, rou in enumerate(rounds): if rou == "": break parts = rou.split(sep) if len(parts) != 2: break parts[0] += sep if has_image: round_ids = tokenizer_image_token(rou, tokenizer) instruction_ids = tokenizer_image_token(parts[0], tokenizer) equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)] instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts) round_len = len(round_ids) else: round_ids = tokenizer(rou).input_ids instruction_ids = tokenizer(parts[0]).input_ids equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)] instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts) round_len = len(round_ids) if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14: round_len += 1 instruction_len += 1 target[cur_len : cur_len + instruction_len] = IGNORE_INDEX cur_len += round_len target[cur_len:] = IGNORE_INDEX if cur_len < tokenizer.model_max_length: if cur_len != total_len + rounds_len - 2: target[:] = IGNORE_INDEX print( f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}." f" (ignored)" ) return dict( input_ids=input_ids, labels=targets, ) def preprocess( sources: Sequence[str], tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN: return preprocess_plain(sources, tokenizer) if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2: return preprocess_llama_2(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version.startswith("v1"): return preprocess_v1(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version == "mpt": return preprocess_mpt(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version.startswith("qwen_v2"): return preprocess_qwen_2(sources, tokenizer, has_image=has_image)
After these operations, the mismatch warning disappeared.
However, I must mention that I don't have GPUs for training now, so there may be other problems.
Hope this helps you.
Okay, after making this change, I trained the model and the loss appears to be normal and mismatch warning disappeared. I trained the MM adapter from scratch and pretrained LLM of Qwen_7B.
@yiyexy hello, nice catch. Am training normal now. Did u trained on llava pretrain data? Does there any pretrain data could be used for Chinese enhancement ?
@yiyexy hello, nice catch. Am training normal now. Did u trained on llava pretrain data? Does there any pretrain data could be used for Chinese enhancement ?
Yes, I trained on LLaVA pretrain data. Unfortunately, I don't have data to enhance the model's capability in Chinese. By the way, I'm currently developing a new data processing pipeline which may solve this problem one day.
@yiyexy Will u consider share your processing pipeline? Which part problem to solve? There are some Chinese data but I think their quality is poor.
@lucasjinreal I will. But it still has some problems to be solved. It's a long way.
@yiyexy Hello, Your loss looks not like stage 1?
BTW, you probably should use qwen1.5-7b-chat model. Otherwise you can not sft efficiently.
However, qwen using chatml chat format, not llava default.
How do u change it?
I found that the reason for this problem is different tokenizer rules. The
bos_token
is null and theeos_token
is set to "<|endoftext|>" in the Qwen tokenizer configuration. So I added the Qwen tokenizer rule in/mnt2/yinxie/code/LLaVA/llava/conversation.py
as follows:class SeparatorStyle(Enum): """Different separator style.""" SINGLE = auto() TWO = auto() MPT = auto() PLAIN = auto() LLAMA_2 = auto() QWEN_2 = auto() def get_prompt(self): elif self.sep_style == SeparatorStyle.QWEN_2: seps = [self.sep, self.sep2] ret = self.system + seps[0] for i, (role, message) in enumerate(messages): if message: if type(message) is tuple: message, _, _ = message ret += role + ": " + message + seps[i % 2] else: ret += role + ":" conv_qwen_2 = Conversation( system="A chat between a curious user and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the user's questions.", roles=("USER", "ASSISTANT"), version="qwen_v2", messages=(), offset=0, sep_style=SeparatorStyle.QWEN_2, sep=" ", sep2="<|endoftext|>", ) conv_templates = { "default": conv_vicuna_v0, "v0": conv_vicuna_v0, "v1": conv_vicuna_v1, "vicuna_v1": conv_vicuna_v1, "qwen_2": conv_qwen_2, "llama_2": conv_llama_2, "mistral_instruct": conv_mistral_instruct, "chatml_direct": conv_chatml_direct, "mistral_direct": conv_chatml_direct, "plain": conv_llava_plain, "v0_plain": conv_llava_plain, "llava_v0": conv_llava_v0, "v0_mmtag": conv_llava_v0_mmtag, "llava_v1": conv_llava_v1, "v1_mmtag": conv_llava_v1_mmtag, "llava_llama_2": conv_llava_llama_2, "mpt": conv_mpt, }
And then, I added the method
preprocess_qwen_2
intrain.py
.def preprocess_qwen_2( sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: conv = conversation_lib.default_conversation.copy() roles = {"human": conv.roles[0], "gpt": conv.roles[1]} # Apply prompt templates conversations = [] for i, source in enumerate(sources): if roles[source[0]["from"]] != conv.roles[0]: # Skip the first one if it is not from human source = source[1:] conv.messages = [] for j, sentence in enumerate(source): role = roles[sentence["from"]] assert role == conv.roles[j % 2], f"{i}" conv.append_message(role, sentence["value"]) conversations.append(conv.get_prompt()) # Tokenize conversations if has_image: input_ids = torch.stack([tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0) else: input_ids = tokenizer( conversations, return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True, ).input_ids targets = input_ids.clone() assert conv.sep_style == conversation_lib.SeparatorStyle.QWEN_2 # Mask targets sep = conv.sep + conv.roles[1] + ": " for conversation, target in zip(conversations, targets): total_len = int(target.ne(tokenizer.pad_token_id).sum()) rounds = conversation.split(conv.sep2) rounds_len = len(rounds) cur_len = 0 # target[:cur_len] = IGNORE_INDEX for i, rou in enumerate(rounds): if rou == "": break parts = rou.split(sep) if len(parts) != 2: break parts[0] += sep if has_image: round_ids = tokenizer_image_token(rou, tokenizer) instruction_ids = tokenizer_image_token(parts[0], tokenizer) equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)] instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts) round_len = len(round_ids) else: round_ids = tokenizer(rou).input_ids instruction_ids = tokenizer(parts[0]).input_ids equal_parts = [x == y for x, y in zip(round_ids, instruction_ids)] instruction_len = equal_parts.index(False) if False in equal_parts else len(equal_parts) round_len = len(round_ids) if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14: round_len += 1 instruction_len += 1 target[cur_len : cur_len + instruction_len] = IGNORE_INDEX cur_len += round_len target[cur_len:] = IGNORE_INDEX if cur_len < tokenizer.model_max_length: if cur_len != total_len + rounds_len - 2: target[:] = IGNORE_INDEX print( f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}." f" (ignored)" ) return dict( input_ids=input_ids, labels=targets, ) def preprocess( sources: Sequence[str], tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN: return preprocess_plain(sources, tokenizer) if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2: return preprocess_llama_2(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version.startswith("v1"): return preprocess_v1(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version == "mpt": return preprocess_mpt(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version.startswith("qwen_v2"): return preprocess_qwen_2(sources, tokenizer, has_image=has_image)
After these operations, the mismatch warning disappeared. However, I must mention that I don't have GPUs for training now, so there may be other problems. Hope this helps you.
Okay, after making this change, I trained the model and the loss appears to be normal and mismatch warning disappeared. I trained the MM adapter from scratch and pretrained LLM of Qwen_7B.
Hi,I hope to replace LLM with Qwen, and I have added it according to your code, but encountered the following error. How can I resolve this?
Original Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
- (Tensor other) didn't match because some of the arguments have invalid types: (NoneType)
- (Number other) didn't match because some of the arguments have invalid types: (NoneType)
@yiyexy Hello, Your loss looks not like stage 1?
BTW, you probably should use qwen1.5-7b-chat model. Otherwise you can not sft efficiently.
However, qwen using chatml chat format, not llava default.
How do u change it?
You are right. The loss is stage 2.
And I use qwen1.5-7b-chat model for this stage.
BTW, I didn't meet problem with the format.
The SFT training is normal. Maybe I ignored some things.
@20191864218 Maybe you need set some parameters for Qwen1.5. #1146
@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/
@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/
Thanks for your reminder. I will pay attention to this issue. I haven't trained a llava-qwen model due to a lack of GPU resources and other work commitments.
I will train a llava-qwen model as soon as possible and share the result with you.
@yiyexy Thank u. Am doing finetune stage now. Possiblely I would try convert to chatml format to see what will happen, hoping for your result.
@20191864218 Maybe you need set some parameters for Qwen1.5. #1146
Thank you, but I've encountered some issues after making the changes. Could you help me with it? I left a comment on the link you provided.
@yiyexy 感谢你。现在正在做微调阶段。也许我会尝试转换为 chatml 格式,看看会发生什么,希望得到你的结果。
so are you use qwen-chat to llava sft?
Yes, am using chatml format to traing now, will update info here.
this is currently Qwen1.8b stage 2 loss goes:
{'loss': 2.5544, 'learning_rate': 8.585365853658537e-06, 'epoch': 0.01}
{'loss': 2.4306, 'learning_rate': 8.682926829268294e-06, 'epoch': 0.01}
{'loss': 2.584, 'learning_rate': 8.78048780487805e-06, 'epoch': 0.01}
{'loss': 2.6411, 'learning_rate': 8.878048780487806e-06, 'epoch': 0.01}
{'loss': 2.4981, 'learning_rate': 8.975609756097562e-06, 'epoch': 0.01}
{'loss': 2.4692, 'learning_rate': 9.073170731707319e-06, 'epoch': 0.01}
{'loss': 2.3996, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}
{'loss': 2.3016, 'learning_rate': 9.170731707317075e-06, 'epoch': 0.01}
Question
I got loss to be 0 when training on Qwen2 backend,
{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0} 0%|▎ | 20/8720 [01:38<11:01:39, 4.56s/it]WARNING: tokenization mismatch: 47 vs. 48. (ignored) WARNING: tokenization mismatch: 54 vs. 55. (ignored) WARNING: tokenization mismatch: 46 vs. 47. (ignored) WARNING: tokenization mismatch: 43 vs. 44. (ignored)
What could be the reason caused it?
@lucasjinreal I meet the same problem. Can you share your code of using qwen1.5-chat llm?
Hi, I have finished training.
I found the qwen4b can get a resonable performance:
But still OCR ability not very good, any suggestion to enhance OCR ability?> (Chinese open data)
I use qwen1.5-7b-chat in the pretrain stage is normal, but sft stage loss is zero. I checked the conversation is aligned. Is there any suggestions @lucasjinreal ? In the training i got an warning : checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None. Can the warning be ignored?
Seems like inputs have None. check the data or add some assersations.
For anyone who want get imediately response and help for training llava, can join this group:
if QRCode outdated, add bojuebot
for invitation.
@20191864218 Maybe you need set some parameters for Qwen1.5. #1146
Hello, do you have a link for replacing the visual encoder?
@yiyexy Using llava template on qwen chat model might introduce unwanted output when chat. This is a common issue. qwen using chatml format which using <|im_end|> as spepartor/
Hello, if using the Qwen-7B-base model for funefine still requires using data in the chatlm format? Thank you for your help
I think base can not be used in vlm, it doens't have chat abilities,.
I think base can not be used in vlm, it doens't have chat abilities,.
I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?
I think base can not be used in vlm, it doens't have chat abilities,.
I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?
Did you verify your method? The LLaVA SFT data is designed for QA tasks, so the results might not be good if you use a base model.
I think base can not be used in vlm, it doens't have chat abilities,.
I want to create a model solely for generating reports, without requiring strong conversational abilities. Can I use the llava fine-tuning data format when fine-tuning?
Did you verify your method? The LLaVA SFT data is designed for QA tasks, so the results might not be good if you use a base model.
I replaced both the LLM and vision encoder, then proceeded with pretraining and finetuning with LoRA. However, I encountered an issue during inference. The specific error is as follows:
Additionally, I am attempting to perform inference using the web interface, but it is also not functioning:
The error is because all tensors are not on the same device
I don't know how to handle this. I would be extremely grateful if you could help me.
@20191864218 This error appears to be due to a corrupted weight file. Please ensure that your weight file has been saved correctly.
@20191864218 This error appears to be due to a corrupted weight file. Please ensure that your weight file has been saved correctly.
Thank you for your response. I merged the LoRA weights according to the merge_lora_weights.py
file in LLaVA. I will double-check where the error occurred. Thanks again.