fengji.zhang issues

Results 9 issues of


                                            fengji.zhang

Bugs for automated example input/output test case extraction

Hi there! CodeRL is a brilliant idea, thanks for the effort! I have also dealt with the APPS dataset, and I found it hard to extract example test cases in...

details of the Elo rating algorithm

Nice work! Interested in the design of 1 vs 1 battles between LVLMs, but can you share more details about the Elo rating algorithm? Like the choice of k-factor, the...

Error building the pathminer package

![image](https://user-images.githubusercontent.com/22430500/140607068-01837cd2-9bcc-4789-95ad-a5d7ad6c5417.png) Hi! I am trying to reproduce your code and come into a problem when I try to rebuild the pathminer kotlin project. Here is a package named astminer, but...

[BUG] <title>QwenVL2 阿里云百炼平台没法设置temperature和sample_num

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an...

Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation

Congratulations on the impressive work! I would like to suggest expanding the evaluation of visual reasoning to the **HumanEval-V** benchmark. This benchmark provides a more challenging set of tasks by...

[Feature] Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation

### Motivation I would like to suggest expanding the evaluation of visual reasoning to the **HumanEval-V** benchmark. This benchmark provides a more challenging set of tasks by introducing **complex diagrams**...

Where is ../data/apps_metric?

I want to reproduce the evaluation pipeline for APPS, while it seems the `../data/apps_metric` invoked in the `test_apps.py` has been removed. How am I supposed to run the evaluation for...

💡 [REQUEST] - Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation

### 起始日期 | Start Date _No response_ ### 实现PR | Implementation PR _No response_ ### 相关Issues | Reference Issues _No response_ ### 摘要 | Summary Congratulations on the impressive work!...

question