R2R
R2R copied to clipboard
Graph Creation Request Fails When Specifying List of document_ids
Describe the bug
Call to create_graph() with list of document_ids via Python SDK yields SQLAlchemy error
To Reproduce
Steps to reproduce the behavior:
- Ingest documents
- Enumerate document ids by calling
get_all_documents() - Attempt to initiate graph generation by calling
create_graph()with list of document_ids from step 3 - See error:
[
"This step failed with error (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text\nLINE 4: WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n ^\nHINT: No operator matches the given name and argument types. You might need to add explicit type casts.\n\n[SQL: \n SELECT document_id, group_ids, user_id, type, metadata, title, version, size_in_bytes, ingestion_status, created_at, updated_at, restructuring_status\n FROM document_info_local_llm_neo4j_kg\n WHERE document_id = ANY(%(document_ids)s)\n ORDER BY created_at DESC\n OFFSET %(offset)s\n LIMIT 100\n ]\n[parameters: {'document_ids': ['e28e8cd3-03e8-5b07-85c3-2beef509fbb0', 'b54b0b42-e0cd-5f18-b66c-9c2d6ab197ab', 'd8e36b31-7886-5b6d-8f56-8efec19f5bf9', '157b2b58-0fdc-58db-93aa-84ed ... (3700 characters truncated) ... -e036-5851-8ebc-8c92ef765a25', '041c2250-1da4-51e3-932e-8d0364daccf1', 'ea553d32-d2ff-5604-9437-ae49b8a0e0a1', '1c7f0a3e-4f40-5339-801a-6a53c408ec44'], 'offset': 0}]\n(Background on this error at: https://sqlalche.me/e/20/f405)\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1967, in _exec_single_context\n self.dialect.do_execute(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py\", line 941, in do_execute\n cursor.execute(statement, parameters)\npsycopg2.errors.UndefinedFunction: operator does not exist: uuid = text\nLINE 4: WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n ^\nHINT: No operator matches the given name and argument types. You might need to add explicit type casts.\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 191, in inner_callback\n output = task.result()\n File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 309, in async_wrapped_action_func\n raise e\n File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 285, in async_wrapped_action_func\n return await action_func(context)\n File \"/app/core/main/hatchet/restructure_workflow.py\", line 83, in kg_extraction_ingress\n documents_overviews = self.restructure_service.providers.database.relational.get_documents_overview(\n File \"/app/core/providers/database/document.py\", line 117, in get_documents_overview\n results = self.execute_query(query, params).fetchall()\n File \"/app/core/providers/database/relational.py\", line 31, in execute_query\n return execute_query(self.vx, query, params)\n File \"/app/core/providers/database/base.py\", line 16, in execute_query\n result = sess.execute(query, params or {})\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py\", line 2362, in execute\n return self._execute_internal(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py\", line 2256, in _execute_internal\n result = conn.execute(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1418, in execute\n return meth(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py\", line 515, in _execute_on_connection\n return connection._execute_clauseelement(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1640, in _execute_clauseelement\n ret = self._execute_context(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1846, in _execute_context\n return self._exec_single_context(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1986, in _exec_single_context\n self._handle_dbapi_exception(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 2355, in _handle_dbapi_exception\n raise sqlalchemy_exception.with_traceback(exc_info[2]) from e\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1967, in _exec_single_context\n self.dialect.do_execute(\n File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py\", line 941, in do_execute\n cursor.execute(statement, parameters)\nsqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text\nLINE 4: WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n ^\nHINT: No operator matches the given name and argument types. You might need to add explicit type casts.\n\n[SQL: \n SELECT document_id, group_ids, user_id, type, metadata, title, version, size_in_bytes, ingestion_status, created_at, updated_at, restructuring_status\n FROM document_info_local_llm_neo4j_kg\n WHERE document_id = ANY(%(document_ids)s)\n ORDER BY created_at DESC\n OFFSET %(offset)s\n LIMIT 100\n ]\n[parameters: {'document_ids': ['e28e8cd3-03e8-5b07-85c3-2beef509fbb0', 'b54b0b42-e0cd-5f18-b66c-9c2d6ab197ab', 'd8e36b31-7886-5b6d-8f56-8efec19f5bf9', '157b2b58-0fdc-58db-93aa-84ed ... (3700 characters truncated) ... -e036-5851-8ebc-8c92ef765a25', '041c2250-1da4-51e3-932e-8d0364daccf1', 'ea553d32-d2ff-5604-9437-ae49b8a0e0a1', '1c7f0a3e-4f40-5339-801a-6a53c408ec44'], 'offset': 0}]\n(Background on this error at: https://sqlalche.me/e/20/f405)\n"
]
Expected behavior
Successful initiation of extraction for graph generation
Screenshots
Desktop (please complete the following information):
- OS: Ubuntu
- Browser Fireforx (130.0 64-bit)
- Version: 24.04.1 LTS
Additional context
r2r version
3.1.20
Looks like there's an issue with the sql query. I'll push a fix shortly. Meanwhile, could you try running create-graph with no args? It should run on all ingested documents.
Yeah everything work fine from CLI or when calling create_graph() without document_id params specified ;) Thanks for the quick reply, I'll wait for the next bug fix release. BUT if I'm not mistaken... if you launch a graph creation request available documents are enumerated by a call to get_all_documents() which defaults to a max of 100 documents, so if you have a document collection with > 100 documents, you will have to call create_graph() multiple times to insure all documents are extracted
Closing this as stale, see https://r2r-docs.sciphi.ai/cookbooks/graphs for an overview of the many improvements made in the graph process over the last few months.