MiniCPM-V
MiniCPM-V copied to clipboard
💡 [REQUEST] - Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
Congratulations on the impressive work!
I would like to suggest expanding the evaluation of visual reasoning to the HumanEval-V benchmark. This benchmark provides a more challenging set of tasks by introducing complex diagrams paired with coding challenges. Unlike traditional visual reasoning tasks that focus on answering multiple-choice questions or providing short answers, HumanEval-V requires models to generate code based on visual input, which better tests both instruction-following and open-ended generation abilities.
Key points for consideration:
- HumanEval-V expands the reasoning scenarios with complex diagrams, pushing the limits of visual understanding.
- The task format is tailored to code generation, making it a suitable benchmark for testing MLLMs’ ability to handle more structured, generative tasks.
- Evaluating this benchmark will provide valuable insights into how well it handles visual reasoning combined with coding, which can be evaluated and rewarded through execution feedback.
基本示例 | Basic Example
You can find more information about the benchmark here: HumanEval-V Homepage.
缺陷 | Drawbacks
NA
未解决问题 | Unresolved questions
No response