trove
trove copied to clipboard
[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks :hammer_and_wrench:
Setup
Install the required packages:
pip install -r requirements.txt
Tasks and datasets are organized as follows:
├── MATH
│ ├── algebra
│ ├── counting_and_probability
│ ├── geometry
│ ├── intermediate_algebra
│ ├── number_theory
│ ├── prealgebra
│ └── precalculus
├── TableQA
│ ├── TabMWP
│ ├── WTQ
│ └── HiTab
├── VQA
└── └── GQA
Running Experiments
Our Method: TroVE
python run_trove.py --task_name "math/algebra"
- For MATH tasks, specify the task name as math/${dataset_name}, e.g., math/algebra.
- For TableQA and VQA tasks, directly used the dataset name: [tabmwp, wtq, hitab, gqa].
Note that the specified --task_name
argument should be lowercased.
Baseline Methods: Primitive & Instance
python baseline.py --task_name "math/algebra" --suffix "primitive" # or "instance"
Note that for GQA dataset, we implement the locate_objects
and visual_qa
functions as fast apis.
So you need to launch the server first (as below), then run the trove/baseline experiments.
uvicorn server.gqa:app
Evaluation
python -m utils.eval --results_path ${RESULTS_PATH}