reflexion issues

Script for leetcode results

1

Hi, Thanks for your excellent work and code! I have the following two questions in hope to get your clarification. 1. May I know the correct way to reproduce the...

shenao-zhang

[Feature Request]: Gymnasium compatibility

Hi, I was wondering if it would be possible to add compatibility with [Gymnasium](https://github.com/Farama-Foundation/Gymnasium). It is a maintained fork of openai gym and is designed as a drop-in replacement (`import...

elliottower

commit for local llm runs

epinnock

Inconsistencies with the humaneval dataset

2

Comparing the original HumanEval dataset with the one in your repository reveals some inconsistencies. For instance, three instances (HumanEval_32, HumanEval_38, and HumanEval_50) are missing from your version (https://github.com/noahshinn/reflexion/blob/main/programming_runs/benchmarks/humaneval-py.jsonl). Additionally, some...

HamedTaherkhani

Using Ground Truth in Evaluator for HotpotQA

1

Hi there, When I was inspecting the code, I found that it was using ground truth in the Evaluator part for HotpotQA. I'm wondering is that correct? Thank you!

Skevinci

Reproducing HotpotQA Results

Hi, Thanks for the great work. Unfortunately, we are unable to reproduce your results for ReAct / Reflexion on HotpotQA. E.g. You say that ReAct+gpt-3.5-turbo has a baseline accuracy of...

haoyb22

reflexion
reflexion copied to clipboard

Metadata

Script for leetcode results

[Feature Request]: Gymnasium compatibility

commit for local llm runs

Inconsistencies with the humaneval dataset

Using Ground Truth in Evaluator for HotpotQA

Reproducing HotpotQA Results

← Metadata

Owner

Metadata

reflexion reflexion copied to clipboard

Metadata

Script for leetcode results

[Feature Request]: Gymnasium compatibility

commit for local llm runs

Inconsistencies with the humaneval dataset

Using Ground Truth in Evaluator for HotpotQA

Reproducing HotpotQA Results

← Metadata

Owner

Metadata

reflexion
reflexion copied to clipboard