graphrag Provide an option to fail fast when LLM calls timeout or are exhausted

Pipeline yields "corrupted" dataframes when Endpoint TPM thresholds have been exceeded to many times. We should provide options to determine the failure behavior on LLM-based operations, so we can fail fast and emit the issue instead of continuing and thus generating incorrect dataframes.

Apr 09 '24 17:04 AlonsoGuevara


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "cause": "Columns must be same length as key", "stack": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 402, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/verbs/graph/clustering/cluster_graph.py\", line 95, in cluster_graph\n    output_df[[level_to, to]] = pd.DataFrame(\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4287, in __setitem__\n    self._setitem_array(key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4329, in _setitem_array\n    check_key_length(self.columns, key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n    raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "details": null}
{"type": "on_step_progress", "data": "Step cluster_graph completed 0 out of 0 items.", "details": {"workflow_name": "create_base_entity_graph", "node_id": "", "verb_name": "cluster_graph", "verb_func_exec": "cluster_graph", "percent": 1, "description": null, "total_items": null, "completed_items": null}}
{"type": "on_step_end", "data": "Step cluster_graph completed.", "details": {"workflow_name": "create_base_entity_graph", "node_id": "", "verb_name": "cluster_graph", "verb_func_exec": "cluster_graph"}}
{"type": "error", "data": "Error running pipeline!", "cause": "Columns must be same length as key", "stack": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/run.py\", line 269, in run_pipeline\n    workflow_result = await workflow.run(context, callbacks)\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 362, in run\n    timing = await self._execute_verb(node, context, callbacks)\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 402, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/verbs/graph/clustering/cluster_graph.py\", line 95, in cluster_graph\n    output_df[[level_to, to]] = pd.DataFrame(\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4287, in __setitem__\n    self._setitem_array(key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4329, in _setitem_array\n    check_key_length(self.columns, key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n    raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "details": null}

Apr 09 '24 17:04 AlonsoGuevara

2024-04-08-17 01 05 062062.logs.json

Apr 09 '24 17:04 AlonsoGuevara

More brainstorming: Have a validation verb against an expected schema? Or validate by default (annotation?) on every verb?

Apr 10 '24 17:04 AlonsoGuevara

Tracked in https://github.com/microsoft/datashaper/issues/708

Apr 16 '24 15:04 darthtrevino

This issue is still relevant. In situations of having low TPM or too many requests being made to the LLM, graphrag users are starting to open issues where this is the root cause (i.e. silent failure due to not enough data validation happening and thus leading to corrupt data frames).

Jul 13 '24 16:07 jgbradley1

Related to #811

Aug 09 '24 00:08 natoverse

graphrag graphrag copied to clipboard

Provide an option to fail fast when LLM calls timeout or are exhausted

graphrag
graphrag copied to clipboard