crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

how to use crewAI for existing code with multiple files and nested multiple folders ?

Open hemangjoshi37a opened this issue 1 year ago • 8 comments

I want to use crewAI for developing ongoing project that is complex and has big code size with multiple files and folder with multiple file types with multiple programming languages.

How to use crewAI in this scenario.

Please help me .

hemangjoshi37a avatar May 06 '24 12:05 hemangjoshi37a

need more context as to what your problem is here

theCyberTech avatar May 07 '24 00:05 theCyberTech

according to my tests, if you want to give the LLMs more general tasks, they would fail. If you are ready to split the load and do some of the work yourself then go ahead - setup tools for reading directories/files and appropriate agents. But don't expect miracles.

deadlious avatar May 07 '24 07:05 deadlious

need more context as to what your problem is here

@theCyberTech my understanding of the current state of crewAI is it works for new projects and completes the given task. But how to use it to modify currently available code?

hemangjoshi37a avatar May 07 '24 07:05 hemangjoshi37a

need more context as to what your problem is here

@theCyberTech my understanding of the current state of crewAI is it works for new projects and completes the given task. But how to use it to modify currently available code?

modifying existing code is with varying success. I've written several tools to give the options for reading/modifying existing files. On the long run, it ends up with messed up code. It's extremely difficult, as I constantly run into the following problems:

  1. Using replacing line with source code -> improper indentation
  2. Using insert at specific line index -> wrong line index
  3. Simple merging based on difflib -> tool is incapable of determining proper merge location
  4. AST based merging - when provided code has empty methods (placeholders, etc.), it overwrites existing code.

Not to mention that I constantly run into problems with the models losing task context or deviating and generalizing task purpose, not ending in proper coding samples.

I've worked in my spare time for a month testing different models - gpt3.5, gpt4, llama3, mixtral, and a bunch of smaller models - trying to make them build a pygame of pong, which is generally under 200 lines of code, providing an existing menu and scoreboard and just requesting the game logic. None worked. Even if I write and order specific tasks for implementation, instead of leaving that work to the crew, all these models fail to create a working code.

The latest tool I used for merging code (number 4) is:

import ast
import astor
import os
from pathlib import Path
from typing import Any, Type, Optional
from textwrap import dedent

from crewai_tools import BaseTool
from pydantic.v1 import BaseModel, Field, validator


def validate_file(
    filename, allow_non_existing: bool = False, allow_existing: bool = True
) -> Path:
    if filename is None:
        raise FileNotFoundError(f"File {filename} was not found.")
    file = Path(filename.replace(r"\_", "_"))
    file = project_folder / f"{file.stem}{file.suffix}"
    if not file.is_file():
        if Path(project_folder / filename).is_file():
            raise ReferenceError(f"Access denied to {filename}")
        if not allow_non_existing:
            raise FileNotFoundError(
                f"File {file.name} was not found. You cannot read/write nonexistent files with this tool"
            )
    elif not allow_existing:
        raise ReferenceError(f"This tool does not allow overwriting existing files. Perhaps Read the file first?")
    if file.suffix.casefold() in {".py", ".json", ".txt"}:
        return file
    else:
        raise AttributeError(
            f"File extension {file.suffix} is not allowed. Only ['.py', '.txt', '.json'] allowed."
        )


def validate_content(filecontent: str) -> str:
    if filecontent is None:
        raise AttributeError(
            "Please provide file content. You are not allowed to create empty files."
        )
    replace_list = {
        r"\\_": "_",
        r"\_": "_",
        r"\@": "@",
        "**init**": "__init__",
        "**name**": "__name__",
        "**main**": "__main__",
        "\t": "    ",
    }
    if "```python" in filecontent:
        filecontent = filecontent.strip().removeprefix("```python").removesuffix("```")
    for o, n in replace_list.items():
        filecontent = filecontent.replace(o, n)
    return filecontent + "\n"


class MergeCodeASTArgs(BaseModel):
    filename: str = Field(..., description="Mandatory file to put the source code in")
    sourcecode: list[str] = Field(..., description="Mandatory source code to merge into the file")


class MergeCodeAST(BaseTool):
    args_schema: Type[BaseModel] = MergeCodeASTArgs
    name: str = "Merge code into file"
    description: str = (
        "Intelligently merge source code into file. If there are conflicts, the conflicting code would be replaced."
    )

    def manual_run(self, filename: str, sourcecode: str) -> str:
        print(f"Merging code: {filename=}, {sourcecode=}")
        file = validate_file(filename)
        check_for_forbidden_files(file)

        # Read the existing Python file
        with open(file, "r") as fx:
            original_code = fx.read()

        # Parse the original and new code into ASTs
        original_tree = ast.parse(original_code)
        new_tree = ast.parse(validate_content(sourcecode))

        # Function to find a node by its name in a given list
        def find_node_by_name(node_list, name):
            for node in node_list:
                if hasattr(node, "name") and node.name == name:
                    return node
            return None

        # Collect all imports and add them to a set to avoid duplicates
        import_nodes = [
            node
            for node in original_tree.body
            if isinstance(node, (ast.Import, ast.ImportFrom))
        ]
        import_names = {ast.dump(node) for node in import_nodes}
        new_imports = [
            node
            for node in new_tree.body
            if isinstance(node, (ast.Import, ast.ImportFrom))
        ]
        for imp in new_imports:
            if ast.dump(imp) not in import_names:
                import_nodes.append(imp)
                import_names.add(ast.dump(imp))

        # Handle other definitions (functions and classes)
        merged_nodes = import_nodes  # start with imports
        new_definitions = [
            node
            for node in new_tree.body
            if not isinstance(node, (ast.Import, ast.ImportFrom))
        ]
        existing_definitions = [
            node
            for node in original_tree.body
            if not isinstance(node, (ast.Import, ast.ImportFrom))
        ]

        for new_node in new_definitions:
            existing_node = find_node_by_name(existing_definitions, new_node.name)
            if existing_node:
                # If there is a conflict in functions or empty classes, replace it
                if isinstance(new_node, ast.FunctionDef) or (
                    isinstance(new_node, ast.ClassDef) and not new_node.body
                ):
                    existing_definitions.remove(existing_node)
                    existing_definitions.append(new_node)
                elif isinstance(new_node, ast.ClassDef):  # Merge class content
                    for new_subnode in new_node.body:
                        existing_subnode = find_node_by_name(
                            existing_node.body, new_subnode.name
                        )
                        if existing_subnode:
                            existing_node.body.remove(existing_subnode)
                        existing_node.body.append(new_subnode)
            else:
                existing_definitions.append(new_node)

        # Combine the import nodes with the updated definitions
        merged_nodes.extend(existing_definitions)

        # Generate the new source code
        original_tree.body = merged_nodes
        new_source_code = astor.to_source(original_tree)

        # Write the new code back to the file
        with open(file, "w") as fx:
            fx.write(new_source_code)

        return f"Source code successfully merged into {filename}"

    def _run(self, filename: str, sourcecode: list[str]):
        self.manual_run(filename, "".join(validate_content(_) for _ in sourcecode))

    def check_exists(self, tree, item):
        for ex_item in tree.body:
            ex_name = getattr(ex_item, "name", None)
            name = getattr(item, "name", None)
            ex_names = getattr(ex_item, "names", None)
            names = getattr(item, "names", None)
            if (
                ex_name
                and name
                and ex_item.name == item.name
                and isinstance(ex_item, type(item))
            ):
                return ex_item
            elif (
                ex_names
                and names
                and set(_.name for _ in names).issubset(_.name for _ in ex_names)
            ):
                return ex_item
        return None

deadlious avatar May 07 '24 09:05 deadlious

If you get this code merge tool working most of the time, consider contributing to:

https://github.com/joaomdmoura/crewAI-tools/

slavakurilyak avatar May 12 '24 17:05 slavakurilyak

If you get this code merge tool working most of the time, consider contributing to:

https://github.com/joaomdmoura/crewAI-tools/

The tool works fine, although it is a bit specific. Perhaps If I decide to add it to the crewai_tools, I'll have to make it more generic. My problem with this tool is that agents don't work well with it. If the agent provides the following snippet:

class MyClass:
    def __init__(self):
        # previous implementation

    def methodX(self, a):
        return self.sum += a

the tool would overwrite the __init__ method, because I cannot iterate over all the possibilities for comments to detect that it is not trying to replace the code, but just to indicate that this method remains as is. And believe me I tried with many models. All failed miserably to work with existing code. They can read it. They can provide guidance how to fix or enhance stuff, but they are unable to adjust it by their own.

deadlious avatar May 12 '24 19:05 deadlious

Have you tried enforcing structured response for syntax using pydantic or instructor?

slavakurilyak avatar May 12 '24 23:05 slavakurilyak

ok apart from this i found this interesting github app that resolves issue using AI please consider adding to this repo if possible. https://github.com/apps/sweep-ai not sponsored or anything but i found this very interesting and helpful.

hemangjoshi37a avatar May 15 '24 13:05 hemangjoshi37a

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 17 '24 12:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Aug 23 '24 12:08 github-actions[bot]