LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

KeyError: 'llava' when evaluating LLaVA-OneVision

Open Bleking opened this issue 1 year ago • 0 comments

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision.md#evaluating-llava-onevision-on-multiple-datasets

I followed the instruction there to evaluate LLaVA-OneVision finetuned with my dataset.

This is my command, and I used 'include_path ' argument for my finetuned model.

accelerate launch --num_processes=8 -m lmms_eval
--model llava_onevision
--model_args pretrained=lmms-lab/llava-onevision-qwen2-0.5b-si,conv_template=qwen_1_5,model_name=llava_qwen
--tasks floorplan_test
--batch_size 1
--log_samples
--log_samples_suffix llava_onevision
--output_path ./logs/
--include_path /home/work/testdataset1/LLaVA-NeXT/checkpoints/OneVision-results/OneVision-siglip-Qwen2-0.5B \

Also, since I am trying to evaluate the finetuned model and want to evaluate with my own test dataset, I made my own yaml file, "floorplan_test.yaml":

dataset_path: Bleking/Floorplan
dataset_kwargs:
  token: True
task: "floorplan_test"
test_split: test
output_type: generate_until
doc_to_target: "answer"
generation_kwargs:
  max_new_tokens: 1000
  temperature: 0
  top_p: 1.0
  num_beams: 1
  do_sample: false
# The return value of process_results will be used by metrics
# process_results: !function.floorplan_process_results
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
metric_list:
  - metric: 
    # aggregation : !function utils.floorplan_aggregate_results
    higher_is_better : true
metadata:
  - version: 1.0

The error I am suffering for is this:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/work/testdataset1/lmms-eval/lmms_eval/__main__.py", line 548, in <module>
    cli_evaluate()
  File "/home/work/testdataset1/lmms-eval/lmms_eval/__main__.py", line 343, in cli_evaluate
    raise e
  File "/home/work/testdataset1/lmms-eval/lmms_eval/__main__.py", line 327, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/home/work/testdataset1/lmms-eval/lmms_eval/__main__.py", line 482, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/home/work/testdataset1/lmms-eval/lmms_eval/utils.py", line 527, in _wrapper
    return fn(*args, **kwargs)
  File "/home/work/testdataset1/lmms-eval/lmms_eval/evaluator.py", line 166, in simple_evaluate
    lm = ModelClass.create_from_arg_string(
  File "/home/work/testdataset1/lmms-eval/lmms_eval/api/model.py", line 91, in create_from_arg_string
    return cls(**args, **args2)
  File "/home/work/testdataset1/lmms-eval/lmms_eval/models/llava_onevision.py", line 127, in __init__
    cfg_pretrained = AutoConfig.from_pretrained(self.pretrained)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1039, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 734, in __getitem__
    raise KeyError(key)
KeyError: 'llava'

I guess I correctly imported the same model for my task from lmms-lab/llava-onevision-qwen2-0.5b-si but I have no idea why I keep having that KeyError with 'llava'.

I see that the KeyError happens according to the "model_type" value of the 'config.json' file from the argument 'model_args pretrained' value. For example, because the "model_type" value of the 'config.json' file is 'llava', I get, KeyError: 'llava'.

So my questions include:

  1. Is it correct to import "lmms-lab/llava-onevision-qwen2-0.5b-si" as the pretrained model like that although I am using my own finetuned LLaVA-OneVision? Or do I have to insert the directory of my finetuned model, whose "model_type" is 'qwen2' since it is a finetuned LLaVA-OneVision-Qwen2-0.5B?
  2. I am directly using llava_onevision and if this is a correct way for evaluating my own finetuned LLaVA-OneVision.
  3. What settings have I done wrong in either my own yaml or command line?

I will also upload the image of the 'config.json' file and the directory for the context. I used merge_lora_weights.py so I can get the model be ready to be evaluated. image

Bleking avatar Sep 13 '24 10:09 Bleking