agential
agential copied to clipboard
[Feature Request]: CRITIC
Implement:
- [x] HotpotQA
- [x] TriviaQA
- [x] #87
- [x] #71
- [x] #73
- [x] #72
- [x] #80
- [x] #81
- [x] #83
- [ ] #84
- [ ] #85
- [ ] #86 (includes ALFWorld & WebShop)
The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.
Run:
- [ ] HotpotQA
- [ ] TriviaQA
- [ ] AmbigNQ
- [ ] GSM8k
- [ ] SVAMP
- [ ] TabMWP
- [ ] MBPP
- [ ] HumanEval
- [ ] ALFWorld
- [ ] WebShop
- [ ] AgentBench (includes ALFWorld & WebShop)