graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

KeyError: 'previous_episodes' on bulk ingest

Open StefanDimitrov95 opened this issue 1 year ago • 2 comments

Hello, I'm trying to ingest a chunked text file using the add_episode_bulk() method to an empty graph. I have ran build_indices_and_constraints() first. Using:

  • Neo4j 5.25.1
  • Python 3.12.7
  • Graphiti 0.4.2
  • gpt-4o-mini
async def bulk_ingest(graphiti, doc_metadata: dict, doc_chunks: list[str]):
    bulk_episodes = [
        RawEpisode(
            name=doc_metadata["composite_file_name"],
            content=row,
            source=EpisodeType.text,
            source_description=doc_metadata["source"],
            reference_time=datetime.datetime.strptime(
                doc_metadata["report_date"], r"%Y-%m-%d"
            ),
        )
        for row in doc_chunks
    ]
    await graphiti.add_episode_bulk(bulk_episodes[:2])
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 138, in <module>
    asyncio.run(main(config))
  File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 82, in main
    await bulk_ingest(graphiti, filing, doc_chunks)
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 111, in bulk_ingest
    await graphiti.add_episode_bulk(bulk_episodes[:2])
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 593, in add_episode_bulk
    raise e
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 557, in add_episode_bulk
    (nodes, uuid_map), extracted_edges_timestamped = await asyncio.gather(
                                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/bulk_utils.py", line 177, in dedupe_nodes_bulk
    await asyncio.gather(
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/node_operations.py", line 189, in dedupe_extracted_nodes
    llm_response = await llm_client.generate_response(prompt_library.dedupe_nodes.node(context))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/lib.py", line 109, in __call__
    return self.func(context)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/dedupe_nodes.py", line 43, in node
    {json.dumps([ep for ep in context['previous_episodes']], indent=2)}
                              ~~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'previous_episodes'

The context built here does not have the keys referenced in the prompt: https://github.com/getzep/graphiti/blob/e42d3ae46c08369a7ecb26e663661a746a177eeb/graphiti_core/utils/maintenance/node_operations.py#L184-L189 https://github.com/getzep/graphiti/blob/e42d3ae46c08369a7ecb26e663661a746a177eeb/graphiti_core/prompts/dedupe_nodes.py#L42-L47

StefanDimitrov95 avatar Dec 03 '24 09:12 StefanDimitrov95

The bulk_add_episode is currently WIP, I can add a comment about it in the code. We don't have it in our documentation for this reason.

Sorry about the confusion!

prasmussen15 avatar Dec 04 '24 16:12 prasmussen15

It's in the docs here: https://help.getzep.com/graphiti/graphiti/adding-episodes#loading-episodes-in-bulk

Do you have a roadmap about such features? Would be great to ingest in bulk, given the regular add_episode method gets progressively slower on every episode. Could you also share a guildeline on the optimal episode content size? Thanks!

StefanDimitrov95 avatar Dec 04 '24 16:12 StefanDimitrov95

Closing this as stale.

danielchalef avatar Jun 27 '25 22:06 danielchalef