AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Error when running data_ingestion.py in version 0.3.0 utilizing local memory cache

Open TechnicalParadox opened this issue 1 year ago • 7 comments

When running the data_ingestion.py script in Auto-GPT version 0.3.0, I encountered a TypeError, which did not occur in version 0.2 Here are the steps to reproduce the issue:

Clone the updated repository for version 0.3.0 Run python data_ingestion.py --dir . --init

Logs:

PS D:\Auto-GPT-0.3.0> python data_ingestion.py --dir . --init
Traceback (most recent call last):
  File "D:\Auto-GPT-0.3.0\data_ingestion.py", line 96, in <module>
  main()
  File "D:\Auto-GPT-0.3.0\data_ingestion.py", line 73, in main
    memory = get_memory(cfg, init=args.init)
  File "D:\Auto-GPT-0.3.0\autogpt\memory\__init__.py", line 78, in get_memory
    memory = LocalCache(cfg)
  File "D:\Auto-GPT-0.3.0\autogpt\singleton.py", line 15, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "D:\Auto-GPT-0.3.0\autogpt\memory\local.py", line 41, in __init__
    workspace_path = Path(cfg.workspace_path)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 871, in __new__
    self = cls._from_parts(args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 509, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 493, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

TechnicalParadox avatar May 03 '23 04:05 TechnicalParadox

I traced the error back to: Auto-GPT-0.3.0\auto-gpt\config\config.py

I notice that in version 0.3.0 lines 20 and 21:

self.workspace_path = None
self.file_logger_path = None

but version 0.2.2 did not have this.

A temporary solution until someone solves this issue

Edit lines 20-21 of auto-gpt/config/config.py to include the paths. For example:

self.workspace_path = "D:\\Auto-GPT-0.3.0\\autogpt\\auto_gpt_workspace"
self.file_logger_path = "D:\\Auto-GPT-0.3.0\\autogpt\\auto_gpt_workspace\\file-logger.txt"

Edit

The above only solves the error, ingestion did not succeed. In addition to the edits above, you must edit line 84 of Auto-GPT-0.3.0/data_ingestion.py to the following:

ingest_directory(cfg.workspace_path+"\\"+args.dir, memory, args)

TechnicalParadox avatar May 03 '23 05:05 TechnicalParadox

Running into the same issue, I thought it might be because they moved the workspace folder and it seems you're correct. But after making these changes it still does not ingest the information even though it is saying it does: "INFO Directory 'data' ingested successfully."

FYI: the file is a .py not .js which you need to edit :-)

joshgaskin avatar May 03 '23 07:05 joshgaskin

A few hours into troubleshooting this, It seems like the files are not being split into chunks correctly. Note that for each file it says "Done ingesting 0 chunks". Line 183 of file_operations.py is where chunks list is created, but it always has length of 0 so

for i, chunk in enumerate(chunks):
            logger.info(f"Ingesting chunk {i + 1} / {num_chunks} into memory")
            memory_to_add = (
                f"Filename: {filename}\n" f"Content part#{i + 1}/{num_chunks}: {chunk}"
            )
            memory.add(memory_to_add)

never runs

TechnicalParadox avatar May 03 '23 07:05 TechnicalParadox

As 0.2.2 iterates through it's loop, memory is immediately added to the local cache. 0.3.0 fails to add anything to LocalCache auto-gpt.json, the file stays {} indefinitely. There may be multiple issues here. Perhaps multiple files are referencing an incorrect path for the workspace and/or auto-gpt.json location but I assume this is only an issue when working with local memory storage

TechnicalParadox avatar May 03 '23 07:05 TechnicalParadox

    if workspace_directory is None:
        workspace_directory = Path(__file__).parent / "auto_gpt_workspace"
    else:
        workspace_directory = Path(workspace_directory)

that is happening only in main.py, but not in data_ingestion.py and there is nothing to set a default one, so it throws... :/

k-boikov avatar May 03 '23 20:05 k-boikov

Same problem here

Snowphoenixfire avatar May 04 '23 04:05 Snowphoenixfire

I get this error too: ` C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0>python data_ingestion.py --dir DataFolder --init Traceback (most recent call last): File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 96, in main() File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 73, in main memory = get_memory(cfg, init=args.init) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\memory_init_.py", line 78, in get_memory memory = LocalCache(cfg) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\singleton.py", line 15, in call cls._instances[cls] = super(Singleton, cls).call(*args, **kwargs) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\memory\local.py", line 41, in init workspace_path = Path(cfg.workspace_path) File "C:\Python310\lib\pathlib.py", line 958, in new self = cls._from_parts(args) File "C:\Python310\lib\pathlib.py", line 592, in _from_parts drv, root, parts = self._parse_args(args) File "C:\Python310\lib\pathlib.py", line 576, in _parse_args a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0>python data_ingestion.py --dir DataFolder --init Traceback (most recent call last): File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 96, in main() File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 73, in main memory = get_memory(cfg, init=args.init) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\memory_init_.py", line 78, in get_memory memory = LocalCache(cfg) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\singleton.py", line 15, in call cls._instances[cls] = super(Singleton, cls).call(*args, **kwargs) File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\autogpt\memory\local.py", line 41, in init workspace_path = Path(cfg.workspace_path) File "C:\Python310\lib\pathlib.py", line 958, in new self = cls._from_parts(args) File "C:\Python310\lib\pathlib.py", line 592, in _from_parts drv, root, parts = self._parse_args(args) File "C:\Python310\lib\pathlib.py", line 576, in _parse_args a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType `

I've tried everything, but nothing helps.

anunknowperson avatar May 14 '23 14:05 anunknowperson

Seems like I found a fix:

Here's what I did so far:

  • Added WORKSPACE_PATH variable (forgot to add FILE_LOGGER_PATH) to .env file
  • Modified default variable to config class for those paths: self.workspace_path = os.getenv("WORKSPACE_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace"))
  • The filenames were relative to the workspace directory, so I switched to read from the joined path of the workspace directory + filename: content = read_file(os.path.join(CFG.workspace_path, filename))
  • Updated the init function for the LocalCache class to avoid wiping the local memory every time it's instantiated.
.env file
## WORKSPACE_PATH - The workspace path to use
WORKSPACE_PATH=./autogpt/auto_gpt_workspace
config.py
class Config(metaclass=Singleton):
    """
    Configuration class to store the state of bools for different scripts access.
    """

    def __init__(self) -> None:
        """Initialize the Config class"""
        self.workspace_path = os.getenv("WORKSPACE_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace"))
        self.file_logger_path = os.getenv("FILE_LOGGER_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace", "file-logger.txt"))
file_operations.py
def ingest_file(
    filename: str, memory, max_length: int = 4000, overlap: int = 200
) -> None:
    """
    Ingest a file by reading its content, splitting it into chunks with a specified
    maximum length and overlap, and adding the chunks to the memory storage.

    :param filename: The name of the file to ingest
    :param memory: An object with an add() method to store the chunks in memory
    :param max_length: The maximum length of each chunk, default is 4000
    :param overlap: The number of overlapping characters between chunks, default is 200
    """
    try:
        logger.info(f"Working with file {filename}")
        # content = read_file(filename)
        content = read_file(os.path.join(CFG.workspace_path, filename))
local.py
def __init__(self, cfg) -> None:
    """Initialize a class instance

    Args:
        cfg: Config object

    Returns:
        None
    """
    workspace_path = Path(cfg.workspace_path)
    self.filename = workspace_path / f"{cfg.memory_index}.json"

    if not self.filename.exists():
        # Create an empty file if it doesn't exist
        self.data = CacheContent()
        with self.filename.open("w+b") as f:
            f.write(b"{}")
    else:
        # If file exists, load its contents
        with self.filename.open("rb") as f:
            file_content = orjson.loads(f.read())
            # Loading the CacheContent object from the file
            self.data = CacheContent(texts=file_content.get("texts", []),
                                     embeddings=np.array(file_content.get("embeddings", create_default_embeddings())))

Edit: updated the LocalCache init function in autogpt/memory/local.py

rolandog avatar May 20 '23 12:05 rolandog

Is there a PR for this or waiting for something?

coolrazor007 avatar Jun 11 '23 05:06 coolrazor007

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] avatar Sep 06 '23 20:09 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

github-actions[bot] avatar Sep 17 '23 01:09 github-actions[bot]