indexify icon indicating copy to clipboard operation
indexify copied to clipboard

Improve Error Handling in Indexify Python SDK

Open PulkitMishra opened this issue 4 months ago • 3 comments

Improve Error Handling in Indexify Python SDK

Issue Description

The current implementation of the Indexify Python SDK lacks robust error handling and reporting mechanisms.

Specific Examples

  1. In indexify/remote_client.py, the _request method:
def _request(self, method: str, **kwargs) -> httpx.Response:
    try:
        response = self._client.request(method, timeout=self._timeout, **kwargs)
        status_code = str(response.status_code)
        if status_code.startswith("4"):
            raise ApiException(
                "status code: " + status_code + " request args: " + str(kwargs)
            )
        if status_code.startswith("5"):
            raise ApiException(response.text)
    except httpx.ConnectError:
        message = (
            f"Make sure the server is running and accesible at {self._service_url}"
        )
        ex = ApiException(status="ConnectionError", message=message)
        print(ex)
        raise ex
    return response

Issues:

  • Only handles httpx.ConnectError, ignoring other potential exceptions.
  • Prints the exception before raising it, which may not be appropriate for all use cases.
  • Doesn't provide detailed context about the failed operation.
  1. In indexify/executor/function_worker.py, the async_submit method:
async def async_submit(
    self,
    namespace: str,
    graph_name: str,
    fn_name: str,
    input: IndexifyData,
    code_path: str,
    version: int,
    init_value: Optional[IndexifyData] = None,
) -> FunctionWorkerOutput:
    try:
        result = await asyncio.get_running_loop().run_in_executor(
            self._executor,
            _run_function,
            namespace,
            graph_name,
            fn_name,
            input,
            code_path,
            version,
            init_value,
        )
    except BrokenProcessPool as mp:
        self._executor.shutdown(wait=True, cancel_futures=True)
        traceback.print_exc()
        raise mp
    except FunctionRunException as e:
        print(e)
        print(traceback.format_exc())
        return FunctionWorkerOutput(
            exception=str(e),
            stdout=e.stdout,
            stderr=e.stderr,
            reducer=e.is_reducer,
            success=False,
        )

Issues:

  • Prints exception information directly, which may not be appropriate for all environments.
  • Doesn't provide a way to customize error handling or logging.
  • Doesn't capture or report the full context of the error (e.g., input data, function details).
  1. In indexify/executor/agent.py, the task_completion_reporter method:
async def task_completion_reporter(self):
    console.print(Text("Starting task completion reporter", style="bold cyan"))
    url = f"{self._protocol}://{self._server_addr}/write_content"
    while True:
        outcomes = await self._task_store.task_outcomes()
        for task_outcome in outcomes:
            # ... (omitted for brevity)
            try:
                self._task_reporter.report_task_outcome(completed_task=task_outcome)
            except Exception as e:
                console.print(
                    Panel(
                        f"Failed to report task {task_outcome.task.id}\n"
                        f"Exception: {e}\nRetrying...",
                        title="Reporting Error",
                        border_style="error",
                    )
                )
                await asyncio.sleep(5)
                continue

Issues:

  • Uses a generic Exception catch, which may mask specific errors.
  • Doesn't provide a mechanism for custom error handling or reporting.
  • The retry mechanism is simplistic and may lead to infinite retries for persistent errors.

Proposed Solution

  1. Create a custom exception hierarchy:

    • Implement a base IndexifyException class.
    • Create specific exception subclasses for different types of errors (e.g., NetworkError, ExecutionError, ConfigurationError).
  2. Implement a centralized error handling and logging mechanism:

    • Create an ErrorHandler class that can be configured with custom logging and reporting options.
    • Use this ErrorHandler consistently throughout the SDK.
  3. Enhance error context:

    • Modify exception classes to include more context (e.g., function name, input data summary, graph details).
    • Implement a method to safely serialize error context, avoiding potential issues with unpicklable objects.
  4. Improve retry mechanisms:

    • Implement an exponential backoff strategy for retries.
    • Allow configuration of retry attempts and conditions.
  5. Add error callback support:

    • Allow users to register custom error callbacks for specific types of errors.

Implementation Plan

  1. Define the exception hierarchy in a new file indexify/exceptions.py.
  2. Implement the ErrorHandler class in indexify/error_handling.py.
  3. Modify existing code to use the new exception classes and ErrorHandler:
    • Update remote_client.py to use specific exceptions and the ErrorHandler.
    • Refactor function_worker.py to provide more context in errors and use the ErrorHandler.
    • Enhance agent.py with improved error handling and retry logic.
  4. Add configuration options for error handling in the client initialization.
  5. Update documentation to reflect the new error handling capabilities.
  6. Add unit tests for the new error handling mechanisms.

PulkitMishra avatar Oct 01 '24 06:10 PulkitMishra