Q-Instruct icon indicating copy to clipboard operation
Q-Instruct copied to clipboard

What is the Evaluation Dataset

Open dacian7 opened this issue 7 months ago • 0 comments

What are the benchmarks for evaluation? E.g.., in Tab.3 and Tab.4, what are the used test datasets? In paper, I noticed that:

The low-level visual abilities of MLLMs after low-level visual instruction tuning are quantitatively evaluated in three tasks defined by [57]

However, it is a little ambiguous for me. Did you use new data to create three similar tasks define in [57], or directly use the same Q-Bench in [57] as the test dataset?

Thanks

dacian7 avatar Jul 09 '24 22:07 dacian7