KeyError: 'previous_episodes' on bulk ingest
Hello, I'm trying to ingest a chunked text file using the add_episode_bulk() method to an empty graph. I have ran build_indices_and_constraints() first. Using:
- Neo4j 5.25.1
- Python 3.12.7
- Graphiti 0.4.2
- gpt-4o-mini
async def bulk_ingest(graphiti, doc_metadata: dict, doc_chunks: list[str]):
bulk_episodes = [
RawEpisode(
name=doc_metadata["composite_file_name"],
content=row,
source=EpisodeType.text,
source_description=doc_metadata["source"],
reference_time=datetime.datetime.strptime(
doc_metadata["report_date"], r"%Y-%m-%d"
),
)
for row in doc_chunks
]
await graphiti.add_episode_bulk(bulk_episodes[:2])
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
cli.main()
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="__main__")
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 138, in <module>
asyncio.run(main(config))
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 82, in main
await bulk_ingest(graphiti, filing, doc_chunks)
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 111, in bulk_ingest
await graphiti.add_episode_bulk(bulk_episodes[:2])
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 593, in add_episode_bulk
raise e
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 557, in add_episode_bulk
(nodes, uuid_map), extracted_edges_timestamped = await asyncio.gather(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/bulk_utils.py", line 177, in dedupe_nodes_bulk
await asyncio.gather(
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/node_operations.py", line 189, in dedupe_extracted_nodes
llm_response = await llm_client.generate_response(prompt_library.dedupe_nodes.node(context))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/lib.py", line 109, in __call__
return self.func(context)
^^^^^^^^^^^^^^^^^^
File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/dedupe_nodes.py", line 43, in node
{json.dumps([ep for ep in context['previous_episodes']], indent=2)}
~~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'previous_episodes'
The context built here does not have the keys referenced in the prompt: https://github.com/getzep/graphiti/blob/e42d3ae46c08369a7ecb26e663661a746a177eeb/graphiti_core/utils/maintenance/node_operations.py#L184-L189 https://github.com/getzep/graphiti/blob/e42d3ae46c08369a7ecb26e663661a746a177eeb/graphiti_core/prompts/dedupe_nodes.py#L42-L47
The bulk_add_episode is currently WIP, I can add a comment about it in the code. We don't have it in our documentation for this reason.
Sorry about the confusion!
It's in the docs here: https://help.getzep.com/graphiti/graphiti/adding-episodes#loading-episodes-in-bulk
Do you have a roadmap about such features? Would be great to ingest in bulk, given the regular add_episode method gets progressively slower on every episode. Could you also share a guildeline on the optimal episode content size? Thanks!
Closing this as stale.