gpt-engineer Improve existing codes

As requested in #79, #95 and #131 I did a proof of concept on how we could add new functionalities or fix errors from a set of existing code.

To use the implementation, we need to do the following setup:

In a file called file_list.txt in the root of your project, add a list of code file paths, one in each line. These files should be enough to give the AI agent context to the modification you want to do, but it does not need to be complete standalone execution.
Using the prompt file we add a description of the modification we need to be done. It could be adding a docstring, adding new functions, classes, or even asking to correct a bug.
Call the modification process by using gpt_engineer projects/test --steps improve_code

In this stage of implementation, we are sending all code text in a single message which is not ideal, so it only works on code small enough to be inside a prompt message, but it is a start.

Here is a small example:

notepad_NI3fbpIOyP

Code_1eWl7Oh2oB

Code_pnyLvtvN7X

This PR is already a working solution. In a future modification, I plan to separate the codes into different messages to see the results.

What do you guys think?

Jul 01 '23 23:07 leomariga

Hey @leomariga great initiative!

Question:

How do you see this being used?

Basically, how does one set up gpt-engineer for a new project?

I'm thinking two options:

create new project with prompt file + memory folder, where workspace symlinks to the existing codebase
gpt-engineer creates a .gpteng folder that contains prompt and memory

WDYT? Maybe there is a third?

Jul 02 '23 12:07 AntonOsika

Also general feedback:

Python style convention is to use snake case
I think requiring a file_paths.txt is too complex for the user :)

Jul 02 '23 12:07 AntonOsika

Created a discussion in discord around this

Jul 02 '23 13:07 AntonOsika

Thank you for your feedback and review =)

I agree with you that manually writing file_paths.txt is not the best user experience. I didn't put much thought into how would be the input method. I also liked the idea of using the .gpteng folder.

(Brainstorming) We could do something like this:

At the root of the existing code repo we call something like gpt-engineer -improve which will create the .gpteng folder.
We write in the terminal what we want to modify. This string will be stored in the prompt file in .gpteng folder.
The tree of files of the project is displayed in the terminal and we select it with numbers (or we could even open the file explorer for the user!)
The gpt-engineer does its job and automatically replaces the file in the end.

I saw the #contributors channel in the discord, but I don't have access to it yet. Let's wait to see the ideas there. =)

Jul 02 '23 22:07 leomariga

Hey, thanks for all the feedback and ideas. I'm working on your suggestions.

I was thinking we could change the file chat_to_files.py to something like utils.py or tools.py and use it to put utility functions, like reading/parsing/writing files or strings.

The methods I put there do not make sense with the file name. I think I should change the location of these methods or rename the file. What do you guys think?

I'm open to ideas.

Jul 03 '23 22:07 leomariga

I think what you say makes sense!

I really prefer specific file names though so it’s easy to find things. Could have chat_to_files and files_to_chat as separate files?

very happy you’re working on this, it’s the top requested feature by far.

Jul 06 '23 21:07 AntonOsika

As for which files to add, I think we should keep it simple at first and just:

ask if all files in selected path argument should be included
if not, let user use arrow keys and enter to toggle which files/folders should be included (I believe there must be a cli arrow select dependency we could use)

Or we just do this: We run the tree command, and then we open the output in a text editor, and then the user can delete rows form it and everything that is not deleted when editor is closed is used.

What do you think?

Jul 06 '23 21:07 AntonOsika

There you go.

Now we only need to call gpt-engineer --improve in the root folder of an existing project. A .gpteng folder will be created by gpt-engineer.
I did some experiments in opening a selection window with python's tkinter instead of selecting the files with the terminal. I think it is much easier to use a gui, but we could implement both if someone insists on using the terminal.

I send a video of this feature working on the discord page.

Jul 09 '23 04:07 leomariga

I've started on a version of this. My approach is to start with the entry point(s), and iteratively include dependencies within the project. Currently creates .filename.meta files for each analyzed file; not unlike .h library files you would have in C describing the contents of the associated .c file.

Jul 09 '23 05:07 hrobbins

I've started on a version of this. My approach is to start with the entry point(s), and iteratively include dependencies within the project. Currently creates .filename.meta files for each analyzed file; not unlike .h library files you would have in C describing the contents of the associated .c file.

Nice. The file selection on this solution is the function ask_for_files(). We can add more input methods there if we want =)

Jul 09 '23 14:07 leomariga

Left some comments!

Also adding some thought for future work:

would be great to automatically print the diffs from the generation and ask the human if they like the diffs or not

Jul 09 '23 19:07 AntonOsika

Quick progress update. Now we can choose how to select the files as follows:

mspaint_W5Z1xZnYoP

Still working on other stuff.

Jul 11 '23 02:07 leomariga

A note for you, the naming convention for the methods is incorrect.

If that's important enough, maybe you'd want to fix it.

Jul 11 '23 03:07 lectair

I am also working on this feature. I have sent you a pull request, @leomariga. @AntonOsika, do you think it would be a good idea to create a branch for this?

Jul 11 '23 07:07 lectair

I think creating a CLI that you can interact with while gpt-4 is editing the code to be able to iterate indefinitely and tell it to delete, add or change anything. This could make the process of developing software really efficient. Something like a ChatGPT but with code. Like the code interpreter of ChatGPT Plus but better :)

But to achieve this it would be essential to get it to always run the code correctly, without problems of paths, setting up files or typing extra commands.

Jul 11 '23 07:07 lectair

Tried out PR agent:

PR Analysis

🎯 Main theme: Adding functionality to improve existing code
🔍 Description and title: Yes
📌 Type of PR: Enhancement
🧪 Relevant tests added: No
✨ Focused PR: Yes, the PR is focused on adding a new feature to improve existing code. All changes are related to this feature.
🔒 Security concerns: No, the PR does not introduce any obvious security concerns. However, it's always a good practice to handle file operations carefully to prevent any potential security issues.

PR Feedback

💡 General PR suggestions: The PR is well-structured and the code changes are well-documented. However, it lacks tests to ensure the new functionality works as expected. It would be beneficial to add unit tests for the new functions and integration tests to ensure the new feature works with the existing code.
🤖 Code suggestions:
- relevant file: gpt_engineer/chat_to_files.py suggestion content: Consider handling exceptions when opening and writing to files. This can prevent the program from crashing if there are issues with file permissions or if the file does not exist. [important]
- relevant file: gpt_engineer/file_selector.py suggestion content: The ask_for_files function could be refactored to reduce its complexity. Consider breaking it down into smaller, more manageable functions. This would improve readability and maintainability. [medium]
- relevant file: gpt_engineer/main.py suggestion content: Consider adding a validation for the improve_option argument. If it is not a boolean, the program may behave unexpectedly. [medium]
- relevant file: gpt_engineer/steps.py suggestion content: The improve_existing_code function is quite long and does a lot of things. Consider breaking it down into smaller functions to improve readability and maintainability. [medium]

Jul 12 '23 08:07 AntonOsika

Great to hear, screenshot looks good!

Jul 12 '23 09:07 AntonOsika

Hey @leomariga – any news on this?

Jul 16 '23 14:07 AntonOsika

Done. I added some consideration @AntonOsika

I just want to point out that @lectair did interesting modifications in a PR to my branch, but I didn't get a response from him after a review I needed to merge into my branch.

I think we can open a new PR with his changes in the official repo when this PR is merged.

Jul 16 '23 18:07 leomariga

Merged with commits from @lectair =)

Jul 21 '23 02:07 leomariga

Hey Leo great to see this ready to be merged.

Last step, which will take some time, is to merge / rebase in main and resolve conflicts so we can merge.

While doing it I have some quick final improvements:

Stop using a directory called workspace when we run the -i command: It should always be run from the directory that one is currently in. Currently, a folder called workspace is created.
See my comments in the PR

Also, small optional improvements:

Make the "user file_txt" option for selecting files first and default
Number the file selection alternatives from 1 (not 0)
Consider not listing any venv or node_modules folders (as they have so many files)

Jul 23 '23 21:07 AntonOsika

One more thing:

We can simplify the "improve code" prompt, and use another final system message focused on explicitly requesting the output to be in the format we want, just like the gen_code step does (it uses the use_qa prompt).

I think we should do it.

When I tried it, it didn't give the right format (see screenshot)

Jul 23 '23 21:07 AntonOsika

Marking this as stale.

Would be great to pick it up, hopefully we don't have to close it.

Aug 05 '23 20:08 AntonOsika

gpt-engineer gpt-engineer copied to clipboard

Improve existing codes

PR Analysis

PR Feedback

gpt-engineer
gpt-engineer copied to clipboard