LLaVA-NeXT vision tower参数问题

看原文应该是tune了vision tower的，但是在lmms-lab/llava-onevision-qwen2-7b-ov的config.json中，有"mm_vision_tower": "google/siglip-so400m-patch14-384"。看上去是加载了原始的vision tower。这里有个问题，不知道是不是先加载原始的vision tower然后再进行的参数覆盖？参数覆盖的时候有warning:

envs/llavaov/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for vision_model.encoder.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '

看上去好像参数覆盖没有生效，最终的输出结果也不是太正常，不知道是什么问题

v100，环境按照.toml文件来的

Sep 25 '24 04:09 liuyijiang1994

看原文应该是tune了vision tower的，但是在lmms-lab/llava-onevision-qwen2-7b-ov的config.json中，有"mm_vision_tower": "google/siglip-so400m-patch14-384"。看上去是加载了原始的vision tower。这里有个问题，不知道是不是先加载原始的vision tower然后再进行的参数覆盖？参数覆盖的时候有warning:
envs/llavaov/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for vision_model.encoder.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
看上去好像参数覆盖没有生效，最终的输出结果也不是太正常，不知道是什么问题

v100，环境按照.toml文件来的

遇到了同样的问题， @Luodian @kcz358 @ZhangYuanhan-AI

Oct 15 '24 02:10 PaulWongDlut

在load model之前加上这段 llava_model_args 应该可以避免这个warning

llava_model_args = {
    "multimodal": True,
    "attn_implementation": "sdpa",
}
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, device_map=device_map, **llava_model_args)

可能tutorial稍微有点问题，我之后修改一下

Oct 16 '24 03:10 kcz358

同样遇到了该问题

Oct 23 '24 07:10 eternal8080

在load model之前加上这段 llava_model_args 应该可以避免这个warning
llava_model_args = {
    "multimodal": True,
    "attn_implementation": "sdpa",
}
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, device_map=device_map, **llava_model_args)
可能tutorial稍微有点问题，我之后修改一下

这个问题会导致部分模型的权重load不正确吗？按照这个方法”在load model之前加上这段 llava_model_args“运行llava/eval/model_vqa.py还是会出现上述warning

Oct 23 '24 15:10 eternal8080

在load model之前加上这段 llava_model_args 应该可以避免这个warning
llava_model_args = {
    "multimodal": True,
    "attn_implementation": "sdpa",
}
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, device_map=device_map, **llava_model_args)
可能tutorial稍微有点问题，我之后修改一下
这个问题会导致部分模型的权重load不正确吗？按照这个方法”在load model之前加上这段 llava_model_args“运行llava/eval/model_vqa.py还是会出现上述warning

我也遇到了这个问题

Nov 04 '24 11:11 ningshanl

That seems like recent version has solved this problem. The huggingface file has the parameters of vision encoder which will rewrite the origin siglip encoder parameters.

Dec 08 '24 08:12 goodstudent9

按照这个方法”在load model之前加上这段 llava_model_args“运行llava/eval/model_vqa.py还是会出现上述warning

可以请问一下是否解决了吗？主要是没法定位到是不是真的有参数错误加载的现象。我也遇到了这个warning，加了之后还是没效果，而且我的attn_implementation是flash_attention，不知道是否一样。

Jan 13 '25 09:01 alcholiclg

按照这个方法”在load model之前加上这段 llava_model_args“运行llava/eval/model_vqa.py还是会出现上述warning

可以请问一下是否解决了吗？主要是没法定位到是不是真的有参数错误加载的现象。我也遇到了这个warning，加了之后还是没效果，而且我的attn_implementation是flash_attention，不知道是否一样。

@alcholiclg 你好请问解决了吗

Jun 28 '25 06:06 chris1220313648

LLaVA-NeXT LLaVA-NeXT copied to clipboard

vision tower参数问题

LLaVA-NeXT
LLaVA-NeXT copied to clipboard