intercode
intercode copied to clipboard
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
I run the eval_n_turn.py to reproduce the **single turn handicap sql** results ```bash python -m experiments.eval_n_turn \ --data_path ./data/sql/spider/ic_spider_dev.json \ --dialogue_limit 5 \ --env sql \ --image_name docker-env-sql \ --log_dir...
When I run this script to test nl2bash. I get an error : **exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown** ``` python -m experiments.eval_n_turn \ --data_path ./data/nl2bash/nl2bash_fs_2.json...
Hi, Thanks for building this environment, that's a really great contribution! I am just wondering why there is no MBPP results in the paper and leadboard? Best, Jiyang Zhang
Currently the latest release on pypi is from June 23'. https://pypi.org/project/intercode-bench/0.1.22/ Can you upload the new 1.0.1 release containing the rewritten CTF environment?
Thanks for your release of the environment code and this is really a nice work! During my re-implementation of the experiments in your paper, I was not sure about what...
Hi, I was trying to run the tests using `pytest` and I realized that a lot of data dependencies for tests do not exist in the repository. Would it be...
When I run the following scripts: ` SQL Call python -m experiments.eval_n_turn \ --data_path ./data/sql/spider/ic_spider_dev.json \ --dialogue_limit 5 \ --env sql \ --image_name docker-env-sql \ --log_dir logs/experiments \ --max_turns 10...
Hi Authors, Thanks for building this environment, that's a really great contribution. I was wondering if it's possible to extend the codebase and either get rid of the dependency on...
1. The OpenAI API has been updated, requiring changes to the code. 2. Task assets cause errors during experiments for tasks 1 and 14. For task 1, I downloaded the...
The current leaderboard results (e.g. bash) are out of date and do not reflect the major model updates released in 2025. To ensure fair and representative evaluation, please update the...