fengji.zhang
fengji.zhang
Hi there! CodeRL is a brilliant idea, thanks for the effort! I have also dealt with the APPS dataset, and I found it hard to extract example test cases in...
Nice work! Interested in the design of 1 vs 1 battles between LVLMs, but can you share more details about the Elo rating algorithm? Like the choice of k-factor, the...
 Hi! I am trying to reproduce your code and come into a problem when I try to rebuild the pathminer kotlin project. Here is a package named astminer, but...
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答? | Is there an...
Congratulations on the impressive work! I would like to suggest expanding the evaluation of visual reasoning to the **HumanEval-V** benchmark. This benchmark provides a more challenging set of tasks by...
### Motivation I would like to suggest expanding the evaluation of visual reasoning to the **HumanEval-V** benchmark. This benchmark provides a more challenging set of tasks by introducing **complex diagrams**...
I want to reproduce the evaluation pipeline for APPS, while it seems the `../data/apps_metric` invoked in the `test_apps.py` has been removed. How am I supposed to run the evaluation for...
### 起始日期 | Start Date _No response_ ### 实现PR | Implementation PR _No response_ ### 相关Issues | Reference Issues _No response_ ### 摘要 | Summary Congratulations on the impressive work!...
Congratulations on the impressive work! I would like to suggest expanding the evaluation of visual reasoning to the **HumanEval-V** benchmark. This benchmark provides a more challenging set of tasks by...