Q-Instruct
Q-Instruct copied to clipboard
What is the Evaluation Dataset
What are the benchmarks for evaluation? E.g.., in Tab.3 and Tab.4, what are the used test datasets? In paper, I noticed that:
The low-level visual abilities of MLLMs after low-level visual instruction tuning are quantitatively evaluated in three tasks defined by [57]
However, it is a little ambiguous for me. Did you use new data to create three similar tasks define in [57], or directly use the same Q-Bench in [57] as the test dataset?
Thanks