AutoGPT
AutoGPT copied to clipboard
Error when running data_ingestion.py in version 0.3.0 utilizing local memory cache
When running the data_ingestion.py script in Auto-GPT version 0.3.0, I encountered a TypeError, which did not occur in version 0.2 Here are the steps to reproduce the issue:
Clone the updated repository for version 0.3.0
Run python data_ingestion.py --dir . --init
Logs:
PS D:\Auto-GPT-0.3.0> python data_ingestion.py --dir . --init
Traceback (most recent call last):
File "D:\Auto-GPT-0.3.0\data_ingestion.py", line 96, in <module>
main()
File "D:\Auto-GPT-0.3.0\data_ingestion.py", line 73, in main
memory = get_memory(cfg, init=args.init)
File "D:\Auto-GPT-0.3.0\autogpt\memory\__init__.py", line 78, in get_memory
memory = LocalCache(cfg)
File "D:\Auto-GPT-0.3.0\autogpt\singleton.py", line 15, in __call__
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
File "D:\Auto-GPT-0.3.0\autogpt\memory\local.py", line 41, in __init__
workspace_path = Path(cfg.workspace_path)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 871, in __new__
self = cls._from_parts(args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 509, in _from_parts
drv, root, parts = self._parse_args(args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\pathlib.py", line 493, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
I traced the error back to:
Auto-GPT-0.3.0\auto-gpt\config\config.py
I notice that in version 0.3.0 lines 20 and 21:
self.workspace_path = None
self.file_logger_path = None
but version 0.2.2 did not have this.
A temporary solution until someone solves this issue
Edit lines 20-21 of auto-gpt/config/config.py to include the paths. For example:
self.workspace_path = "D:\\Auto-GPT-0.3.0\\autogpt\\auto_gpt_workspace"
self.file_logger_path = "D:\\Auto-GPT-0.3.0\\autogpt\\auto_gpt_workspace\\file-logger.txt"
Edit
The above only solves the error, ingestion did not succeed.
In addition to the edits above, you must edit line 84 of Auto-GPT-0.3.0/data_ingestion.py
to the following:
ingest_directory(cfg.workspace_path+"\\"+args.dir, memory, args)
Running into the same issue, I thought it might be because they moved the workspace folder and it seems you're correct. But after making these changes it still does not ingest the information even though it is saying it does: "INFO Directory 'data' ingested successfully."
FYI: the file is a .py not .js which you need to edit :-)
A few hours into troubleshooting this, It seems like the files are not being split into chunks correctly. Note that for each file it says "Done ingesting 0 chunks". Line 183 of file_operations.py is where chunks list is created, but it always has length of 0 so
for i, chunk in enumerate(chunks):
logger.info(f"Ingesting chunk {i + 1} / {num_chunks} into memory")
memory_to_add = (
f"Filename: {filename}\n" f"Content part#{i + 1}/{num_chunks}: {chunk}"
)
memory.add(memory_to_add)
never runs
As 0.2.2 iterates through it's loop, memory is immediately added to the local cache.
0.3.0 fails to add anything to LocalCache auto-gpt.json, the file stays {}
indefinitely. There may be multiple issues here. Perhaps multiple files are referencing an incorrect path for the workspace and/or auto-gpt.json location but I assume this is only an issue when working with local memory storage
if workspace_directory is None:
workspace_directory = Path(__file__).parent / "auto_gpt_workspace"
else:
workspace_directory = Path(workspace_directory)
that is happening only in main.py, but not in data_ingestion.py and there is nothing to set a default one, so it throws... :/
Same problem here
I get this error too:
`
C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0>python data_ingestion.py --dir DataFolder --init
Traceback (most recent call last):
File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 96, in
C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0>python data_ingestion.py --dir DataFolder --init
Traceback (most recent call last):
File "C:\Users\Sereja\Downloads\Auto-GPT-0.3.0\Auto-GPT-0.3.0\data_ingestion.py", line 96, in
I've tried everything, but nothing helps.
Seems like I found a fix:
Here's what I did so far:
- Added
WORKSPACE_PATH
variable (forgot to addFILE_LOGGER_PATH
) to.env
file - Modified default variable to config class for those paths:
self.workspace_path = os.getenv("WORKSPACE_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace"))
- The filenames were relative to the workspace directory, so I switched to read from the joined path of the workspace directory + filename:
content = read_file(os.path.join(CFG.workspace_path, filename))
- Updated the init function for the LocalCache class to avoid wiping the local memory every time it's instantiated.
.env file
## WORKSPACE_PATH - The workspace path to use
WORKSPACE_PATH=./autogpt/auto_gpt_workspace
config.py
class Config(metaclass=Singleton):
"""
Configuration class to store the state of bools for different scripts access.
"""
def __init__(self) -> None:
"""Initialize the Config class"""
self.workspace_path = os.getenv("WORKSPACE_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace"))
self.file_logger_path = os.getenv("FILE_LOGGER_PATH", os.path.join(os.path.dirname(__file__), "..", "auto_gpt_workspace", "file-logger.txt"))
file_operations.py
def ingest_file(
filename: str, memory, max_length: int = 4000, overlap: int = 200
) -> None:
"""
Ingest a file by reading its content, splitting it into chunks with a specified
maximum length and overlap, and adding the chunks to the memory storage.
:param filename: The name of the file to ingest
:param memory: An object with an add() method to store the chunks in memory
:param max_length: The maximum length of each chunk, default is 4000
:param overlap: The number of overlapping characters between chunks, default is 200
"""
try:
logger.info(f"Working with file {filename}")
# content = read_file(filename)
content = read_file(os.path.join(CFG.workspace_path, filename))
local.py
def __init__(self, cfg) -> None:
"""Initialize a class instance
Args:
cfg: Config object
Returns:
None
"""
workspace_path = Path(cfg.workspace_path)
self.filename = workspace_path / f"{cfg.memory_index}.json"
if not self.filename.exists():
# Create an empty file if it doesn't exist
self.data = CacheContent()
with self.filename.open("w+b") as f:
f.write(b"{}")
else:
# If file exists, load its contents
with self.filename.open("rb") as f:
file_content = orjson.loads(f.read())
# Loading the CacheContent object from the file
self.data = CacheContent(texts=file_content.get("texts", []),
embeddings=np.array(file_content.get("embeddings", create_default_embeddings())))
Edit: updated the LocalCache init function in autogpt/memory/local.py
Is there a PR for this or waiting for something?
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue was closed automatically because it has been stale for 10 days with no activity.