graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Provide an option to fail fast when LLM calls timeout or are exhausted

Open AlonsoGuevara opened this issue 1 year ago • 6 comments

Pipeline yields "corrupted" dataframes when Endpoint TPM thresholds have been exceeded to many times. We should provide options to determine the failure behavior on LLM-based operations, so we can fail fast and emit the issue instead of continuing and thus generating incorrect dataframes.

AlonsoGuevara avatar Apr 09 '24 17:04 AlonsoGuevara


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: Columns must be same length as key", "cause": "Columns must be same length as key", "stack": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 402, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/verbs/graph/clustering/cluster_graph.py\", line 95, in cluster_graph\n    output_df[[level_to, to]] = pd.DataFrame(\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4287, in __setitem__\n    self._setitem_array(key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4329, in _setitem_array\n    check_key_length(self.columns, key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n    raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "details": null}
{"type": "on_step_progress", "data": "Step cluster_graph completed 0 out of 0 items.", "details": {"workflow_name": "create_base_entity_graph", "node_id": "", "verb_name": "cluster_graph", "verb_func_exec": "cluster_graph", "percent": 1, "description": null, "total_items": null, "completed_items": null}}
{"type": "on_step_end", "data": "Step cluster_graph completed.", "details": {"workflow_name": "create_base_entity_graph", "node_id": "", "verb_name": "cluster_graph", "verb_func_exec": "cluster_graph"}}
{"type": "error", "data": "Error running pipeline!", "cause": "Columns must be same length as key", "stack": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/run.py\", line 269, in run_pipeline\n    workflow_result = await workflow.run(context, callbacks)\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 362, in run\n    timing = await self._execute_verb(node, context, callbacks)\n  File \"/usr/local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 402, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/usr/local/lib/python3.10/site-packages/indexing_engine/verbs/graph/clustering/cluster_graph.py\", line 95, in cluster_graph\n    output_df[[level_to, to]] = pd.DataFrame(\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4287, in __setitem__\n    self._setitem_array(key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4329, in _setitem_array\n    check_key_length(self.columns, key, value)\n  File \"/usr/local/lib/python3.10/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n    raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "details": null}

AlonsoGuevara avatar Apr 09 '24 17:04 AlonsoGuevara

More brainstorming: Have a validation verb against an expected schema? Or validate by default (annotation?) on every verb?

AlonsoGuevara avatar Apr 10 '24 17:04 AlonsoGuevara

Tracked in https://github.com/microsoft/datashaper/issues/708

darthtrevino avatar Apr 16 '24 15:04 darthtrevino

This issue is still relevant. In situations of having low TPM or too many requests being made to the LLM, graphrag users are starting to open issues where this is the root cause (i.e. silent failure due to not enough data validation happening and thus leading to corrupt data frames).

jgbradley1 avatar Jul 13 '24 16:07 jgbradley1

Related to #811

natoverse avatar Aug 09 '24 00:08 natoverse