Hi, is there a bug in Video-LLaVA-main/videollava/model/multimodal_encoder/builder.py?
I want to do finetune based on native llama and languagebind.
In principle, if the model is downloaded locally, it will take the first "if" (because if is_absolute_path_exists is True), but this will cause it to a misalign error.
But if I manually switch to the second branch, it says imagetower and videotower's hiddendim are different.
But I think my configuration files are all pulled from huggingface, there should be no configuration errors? So what causes such a strange phenomenon?
What is your "image tower"? The assertion function enforces the encoder's output dimension to be 1024. It appears that 768 is the dimension for a base version of the image encoder.
I have the same problem in local computer, but it works in https://colab.research.google.com/. error like: RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
save issue
Hi everyone, what is your "image_tower"? is there a minimal runtime code to help me reproduce the error?
config file:
"intermediate_size": 11008, "max_position_embeddings": 4096, "mm_hidden_size": 1024, "mm_image_tower": "/home/demi/model_lib/LanguageBind_Image", "mm_projector_type": "mlp2x_gelu", "mm_use_x_patch_token": false, "mm_use_x_start_end": false, "mm_video_tower": "/home/demi/model_lib/LanguageBind_Video_merge", "mm_vision_select_feature": "patch", "mm_vision_select_layer": -2, "model_type": "llava", "num_attention_heads": 32,
If you want to run model locally, maybe you can refer to this issue. https://github.com/PKU-YuanGroup/Video-LLaVA/issues/57#issuecomment-1880367313
I sovled! I changed the code just like
def build_image_tower(image_tower_cfg, **kwargs):
image_tower = getattr(image_tower_cfg, 'mm_image_tower', getattr(image_tower_cfg, 'image_tower', None))
is_absolute_path_exists = os.path.exists(image_tower)
# if is_absolute_path_exists or image_tower.startswith("openai") or image_tower.startswith("laion"):
# return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs)
if image_tower.startswith("openai") or image_tower.startswith("laion"):
return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs)
if image_tower.endswith('LanguageBind_Image'):
return LanguageBindImageTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs)
if 'mae' in image_tower:
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
print('maemaemaemaemaemaemaemae')
return MAEVisionTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs)
raise ValueError(f'Unknown image tower: {image_tower}')
In fact, if you choose running locally, and you should choose the second "if". I haven't changed anything else, but the "mismatch" error disappear, so it's still weird, but anyway, it works now!
I sovled! I changed the code just like
def build_image_tower(image_tower_cfg, **kwargs): image_tower = getattr(image_tower_cfg, 'mm_image_tower', getattr(image_tower_cfg, 'image_tower', None)) is_absolute_path_exists = os.path.exists(image_tower) # if is_absolute_path_exists or image_tower.startswith("openai") or image_tower.startswith("laion"): # return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs) if image_tower.startswith("openai") or image_tower.startswith("laion"): return CLIPVisionTower(image_tower, args=image_tower_cfg, **kwargs) if image_tower.endswith('LanguageBind_Image'): return LanguageBindImageTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs) if 'mae' in image_tower: print('maemaemaemaemaemaemaemae') print('maemaemaemaemaemaemaemae') print('maemaemaemaemaemaemaemae') print('maemaemaemaemaemaemaemae') print('maemaemaemaemaemaemaemae') return MAEVisionTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs) raise ValueError(f'Unknown image tower: {image_tower}')In fact, if you choose running locally, and you should choose the second "if". I haven't changed anything else, but the "mismatch" error disappear, so it's still weird, but anyway, it works now!
Great! Congrats