gpt-pilot icon indicating copy to clipboard operation
gpt-pilot copied to clipboard

Evaluate against SWE-bench benchmark

Open kripper opened this issue 3 months ago • 3 comments

Version

Command-line (Python) version

Suggestion

Evaluate against SWE-bench Benchmark: https://github.com/princeton-nlp/SWE-bench

kripper avatar Mar 22 '24 04:03 kripper

Loooooooool

luyandadhlamini2 avatar Mar 22 '24 05:03 luyandadhlamini2

Loooooooool

no?

kripper avatar Mar 22 '24 05:03 kripper

@kripper this is a good suggestion and we've looked into it.

SWE-bench is geard toward assistants who work on a small part of a bigger project. We're working from a different starting point - creating full-featured projects from scratch. In creating a full project there are many other difficult challenges (eg software architecture, refactoring, etc) that SWE-bench doesn't cover (fully, or at all).

As a consequence, currently we don't support the workflow that SWE-bench assumes.

Being able to take over an existing project is something we're currently working on, so in the future we'll also be able to support that use case and, as a result, be able to compare using SWE-bench.

senko avatar Mar 22 '24 05:03 senko