agential issues

Evaluation Metrics, Fix-ups, Output Parsing for Evaluation

4

### 🤔 Reasoning _Explain the purpose of this PR..._ ### 🚧 Changes _Describe the changes made..._ ### ✅ PR Checklist - [x] Using this PR template? - [x] Linked issue?...

alckasoc

enhancement

design

[Feature Request]: Standard way to Load states dict and dump states dict (like torch)

### Feature Description . ### Reason _No response_

alckasoc

enhancement

[Feature Request]: Evaluation Metrics

### Feature Description Evaluation metrics like f1, precision, recall, EM, fuzzy match?, pass@k and any other ones relevant to our currently supported benchmarks ### Reason _No response_

alckasoc

enhancement

[Feature Request]: Eval Framework/Harness

### Feature Description - an `evaluator` module under `eval` that can both do output parsing and evaluation for each agent so the user doesn't need to generate then write their...

alckasoc

enhancement

Priority: Medium

[Feature Request]: LATS

### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #216 - [x] #217 - [x] #219 - [x] #218 - [x] #220 - [x] #221 - [x] #222 - [x] #223 -...

alckasoc

enhancement

Priority: High

method

[Feature Request]: ExpeL

### Feature Description https://arxiv.org/abs/2310.04406 **Implement**: - [x] #233 - [x] #234 - [x] #236 - [x] #235 - [x] #237 - [x] #238 - [x] #239 - [x] #240 -...

alckasoc

enhancement

Priority: High

method

[Feature Request]: Re-introduce Self-Refine

### Feature Description There's an argument to be made that the mechanism of generating critique is somewhat agentic. Let's keep Self-Refine, but we will re-introduce it at least after https://github.com/agential-ai/agential/milestone/4...

alckasoc

enhancement

Priority: High

[Feature Request]: Reflexion

### Feature Description **Implement**: - [x] HotpotQA - [x] #123 - [x] #122 - [x] #124 - [x] #125 - [x] #126 - [x] #127 - [x] #128 - [x]...

alckasoc

enhancement

Priority: High

method

[Feature Request]: MATH Benchmark

### Feature Description MATH benchmark is harder than GSM8K. May be worth including down the line. ### Reason _No response_

alckasoc

enhancement

Priority: Low

[Feature Request]: ReAct

### Feature Description **Implement**: - [x] HotpotQA - [x] #89 - [x] FEVER - [x] #90 - [x] #91 - [x] #92 - [x] #93 - [x] #94 - [x]...

alckasoc

enhancement

Priority: High

method

agential
agential copied to clipboard

Metadata

Evaluation Metrics, Fix-ups, Output Parsing for Evaluation

[Feature Request]: Standard way to Load states dict and dump states dict (like torch)

[Feature Request]: Evaluation Metrics

[Feature Request]: Eval Framework/Harness

[Feature Request]: LATS

[Feature Request]: ExpeL

[Feature Request]: Re-introduce Self-Refine

[Feature Request]: Reflexion

[Feature Request]: MATH Benchmark

[Feature Request]: ReAct

← Metadata

Owner

Metadata

agential agential copied to clipboard

Metadata

← Metadata

Owner

Metadata

agential
agential copied to clipboard