MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

💡 [REQUEST] - Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation

Open zfj1998 opened this issue 1 year ago • 0 comments

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

Congratulations on the impressive work!

I would like to suggest expanding the evaluation of visual reasoning to the HumanEval-V benchmark. This benchmark provides a more challenging set of tasks by introducing complex diagrams paired with coding challenges. Unlike traditional visual reasoning tasks that focus on answering multiple-choice questions or providing short answers, HumanEval-V requires models to generate code based on visual input, which better tests both instruction-following and open-ended generation abilities.

Key points for consideration:

  • HumanEval-V expands the reasoning scenarios with complex diagrams, pushing the limits of visual understanding.
  • The task format is tailored to code generation, making it a suitable benchmark for testing MLLMs’ ability to handle more structured, generative tasks.
  • Evaluating this benchmark will provide valuable insights into how well it handles visual reasoning combined with coding, which can be evaluated and rewarded through execution feedback.

基本示例 | Basic Example

You can find more information about the benchmark here: HumanEval-V Homepage.

缺陷 | Drawbacks

NA

未解决问题 | Unresolved questions

No response

zfj1998 avatar Feb 25 '25 03:02 zfj1998