LLaVA-NeXT inference with LLM and vision frozen

inference with LLM and vision frozen

Open simoneriggi opened this issue 10 months ago • 0 comments

Dear all, I have fine-tuned a LLaVA-OneVision (0.5B and 7B) with LLM and vision components frozen. The checkpoint output directory contains these files:

runs
checkpoint-1000     
...
...
checkpoint-22000
trainer_state.json
mm_projector.bin
config.json

Loading the trained model with load_pretrained_model(model_path, model_base=None, model_name='llava_qwen', device_map='auto') fail as no tokenizer files are found in the model_path. When I copy tokenizer files (tokenizer_config.json, tokenizer.json) from the base model, loading fails with this error: OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory. I had a look at the load_pretrained_model method in the builder.py file. It seems that I should set model_base to base model (e.g. lmms-lab/llava-onevision-qwen2-0.5b-ov) rather than setting to None. Also, it seems that some logic to load qwen model is missing in the method. I tried to add this code:

elif "qwen" in model_name.lower():
    from llava.model.language_model.llava_qwen import LlavaQwenConfig, LlavaQwenForCausalLM
            	
    tokenizer = AutoTokenizer.from_pretrained(model_base, use_fast=False)
    if overwrite_config is not None:
        llava_cfg = LlavaQwenConfig.from_pretrained(model_path)
        rank0_print(f"Overwriting config with {overwrite_config}")
        for k, v in overwrite_config.items():
             setattr(llava_cfg, k, v)
        model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, config=llava_cfg, **kwargs)
    else:
        model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, **kwargs)

right after:

elif model_base is not None:
    ...
    ...
    elif (
                "wizardlm-2" in model_name.lower()
                and "vicuna" in model_name.lower()
                or "llama" in model_name.lower()
                or "yi" in model_name.lower()
                or "nous-hermes" in model_name.lower()
                or "llava-v1.6-34b" in model_name.lower()
                or "llava-v1.5" in model_name.lower()
            ):
            ....
            ....
            model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=llava_cfg, **kwargs)      

   [ADD CODE HERE]

I managed to load the model with this fix. Can you please confirm if this is correct or if I am doing something wrong? Thanks a lot for your help.

Jan 22 '25 17:01 simoneriggi

LLaVA-NeXT LLaVA-NeXT copied to clipboard

inference with LLM and vision frozen

LLaVA-NeXT
LLaVA-NeXT copied to clipboard