agential icon indicating copy to clipboard operation
agential copied to clipboard

🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!

Results 23 agential issues
Sort by recently updated
recently updated
newest added

### 🤔 Reasoning _Explain the purpose of this PR..._ ### 🚧 Changes _Describe the changes made..._ ### ✅ PR Checklist - [x] Using this PR template? - [x] Linked issue?...

enhancement
design

### Feature Description Evaluation metrics like f1, precision, recall, EM, fuzzy match?, pass@k and any other ones relevant to our currently supported benchmarks ### Reason _No response_

enhancement

### Feature Description - an `evaluator` module under `eval` that can both do output parsing and evaluation for each agent so the user doesn't need to generate then write their...

enhancement
Priority: Medium

### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #216 - [x] #217 - [x] #219 - [x] #218 - [x] #220 - [x] #221 - [x] #222 - [x] #223 -...

enhancement
Priority: High
method

### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #233 - [x] #234 - [x] #236 - [x] #235 - [x] #237 - [x] #238 - [x] #239 - [x] #240 -...

enhancement
Priority: High
method

### Feature Description There's an argument to be made that the mechanism of generating critique is somewhat agentic. Let's keep Self-Refine, but we will re-introduce it at least after https://github.com/agential-ai/agential/milestone/4...

enhancement
Priority: High

### Feature Description **Implement**: - [x] HotpotQA - [x] #123 - [x] #122 - [x] #124 - [x] #125 - [x] #126 - [x] #127 - [x] #128 - [x]...

enhancement
Priority: High
method

### Feature Description MATH benchmark is harder than GSM8K. May be worth including down the line. ### Reason _No response_

enhancement
Priority: Low

### Feature Description **Implement**: - [x] HotpotQA - [x] #89 - [x] FEVER - [x] #90 - [x] #91 - [x] #92 - [x] #93 - [x] #94 - [x]...

enhancement
Priority: High
method