agential
agential copied to clipboard
🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
### 🤔 Reasoning _Explain the purpose of this PR..._ ### 🚧 Changes _Describe the changes made..._ ### ✅ PR Checklist - [x] Using this PR template? - [x] Linked issue?...
### Feature Description . ### Reason _No response_
### Feature Description Evaluation metrics like f1, precision, recall, EM, fuzzy match?, pass@k and any other ones relevant to our currently supported benchmarks ### Reason _No response_
### Feature Description - an `evaluator` module under `eval` that can both do output parsing and evaluation for each agent so the user doesn't need to generate then write their...
### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #216 - [x] #217 - [x] #219 - [x] #218 - [x] #220 - [x] #221 - [x] #222 - [x] #223 -...
### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #233 - [x] #234 - [x] #236 - [x] #235 - [x] #237 - [x] #238 - [x] #239 - [x] #240 -...
### Feature Description There's an argument to be made that the mechanism of generating critique is somewhat agentic. Let's keep Self-Refine, but we will re-introduce it at least after https://github.com/agential-ai/agential/milestone/4...
### Feature Description **Implement**: - [x] HotpotQA - [x] #123 - [x] #122 - [x] #124 - [x] #125 - [x] #126 - [x] #127 - [x] #128 - [x]...
### Feature Description MATH benchmark is harder than GSM8K. May be worth including down the line. ### Reason _No response_
### Feature Description **Implement**: - [x] HotpotQA - [x] #89 - [x] FEVER - [x] #90 - [x] #91 - [x] #92 - [x] #93 - [x] #94 - [x]...