AutoGPT Enhanced coding capabilities ⌨️

Summary

Auto-GPT's coding skills aren't great when it comes to multidisciplinary projects.

Like #5179, the proposal here is to improve this with a sub-agent, either by integrating an existing third-party coding agent or by implementing our own.

Background

The following projects can be part of a solution, or can serve as inspiration.

#5132
#1299

Open Interpreter

Website: openinterpreter.com Repo: KillianLucas/open-interpreter

https://github.com/KillianLucas/open-interpreter/assets/63927363/37152071-680d-4423-9af3-64836a6f7b60

GPT-Engineer

Repo: AntonOsika/gpt-engineer

https://github.com/AntonOsika/gpt-engineer/assets/4467025/6e362e45-4a94-4b0d-973d-393a31d92d9b

StarCoder

Repo: bigcode-project/starcoder

💫 StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks. This repository showcases how we get an overview of this LM's capabilities.

Used in this PR:

#4455

Sep 09 '23 00:09 Pwuts

For the record, I talked to @merwanehamadi to implement semi-automated benchmark generation by using a new type of "BenchmarkGenerator" that would be just a stub to hook up our benchmarks to arbitrary HTTP APIs - the idea here being to fetch tasks (=challenges) from coding competitions (rosetta code, leetcode and kaggle etc) to procedurally generate a ton of coding related benchmarks and use this to identify blind spots: #5536

I have shared some experimental proof-of-concept code to illustrate the concept and the idea would be to get this working for 1) rosetta code and 2) just the "hello world" category using a handful of mainstream languages (python, javascript, php, ruby) - and then take it from there by adding more tasks from rosetta code.

Once that is working, we could look at adding support for leetcode etc

Also from a benchmarking standpoint https://github.com/Significant-Gravitas/AutoGPT/tree/master/benchmark It might make sense to take a look at coccinelle/spatch (commonly used by Linux kernel devs for semantic patching). The point being, coccinelle patches are a good way to create patches that are valid and work - so we could use those as unti tests for benchmarking C related refactorings:

rename function/structure
move function
modify loop
etc

This is the sort of stuff that can be done directly using coccinelle, so we could use its output as unit tests for our agents.

Again, the idea here would only be to generate a ton of benchmarks + unit tests semi-automagically, for coding related AI

Oct 05 '23 22:10 Boostrix

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

Feb 16 '24 01:02 github-actions[bot]

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

Apr 08 '24 01:04 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

Apr 19 '24 01:04 github-actions[bot]

AutoGPT AutoGPT copied to clipboard

Enhanced coding capabilities ⌨️

Summary

Background

Open Interpreter

GPT-Engineer

StarCoder

AutoGPT
AutoGPT copied to clipboard