AutoGPT
AutoGPT copied to clipboard
Enhanced coding capabilities ⌨️
Summary
Auto-GPT's coding skills aren't great when it comes to multidisciplinary projects.
Like #5179, the proposal here is to improve this with a sub-agent, either by integrating an existing third-party coding agent or by implementing our own.
Background
The following projects can be part of a solution, or can serve as inspiration.
- #5132
- #1299
Open Interpreter
Website: openinterpreter.com Repo: KillianLucas/open-interpreter
https://github.com/KillianLucas/open-interpreter/assets/63927363/37152071-680d-4423-9af3-64836a6f7b60
GPT-Engineer
Repo: AntonOsika/gpt-engineer
https://github.com/AntonOsika/gpt-engineer/assets/4467025/6e362e45-4a94-4b0d-973d-393a31d92d9b
StarCoder
Repo: bigcode-project/starcoder
💫 StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks. This repository showcases how we get an overview of this LM's capabilities.
Used in this PR:
- #4455
For the record, I talked to @merwanehamadi to implement semi-automated benchmark generation by using a new type of "BenchmarkGenerator" that would be just a stub to hook up our benchmarks to arbitrary HTTP APIs - the idea here being to fetch tasks (=challenges) from coding competitions (rosetta code, leetcode and kaggle etc) to procedurally generate a ton of coding related benchmarks and use this to identify blind spots: #5536
I have shared some experimental proof-of-concept code to illustrate the concept and the idea would be to get this working for 1) rosetta code and 2) just the "hello world" category using a handful of mainstream languages (python, javascript, php, ruby) - and then take it from there by adding more tasks from rosetta code.
Once that is working, we could look at adding support for leetcode etc
Also from a benchmarking standpoint https://github.com/Significant-Gravitas/AutoGPT/tree/master/benchmark It might make sense to take a look at coccinelle/spatch (commonly used by Linux kernel devs for semantic patching). The point being, coccinelle patches are a good way to create patches that are valid and work - so we could use those as unti tests for benchmarking C related refactorings:
- rename function/structure
- move function
- modify loop
- etc
This is the sort of stuff that can be done directly using coccinelle, so we could use its output as unit tests for our agents.
Again, the idea here would only be to generate a ton of benchmarks + unit tests semi-automagically, for coding related AI
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue was closed automatically because it has been stale for 10 days with no activity.