indexify icon indicating copy to clipboard operation
indexify copied to clipboard

Versioning for Graphs and Functions in Python SDK

Open PulkitMishra opened this issue 1 year ago • 2 comments

Introduce Versioning for Graphs and Functions

Issue Description

Currently, the Indexify Python SDK lacks a robust versioning system for graphs and functions. This makes it challenging to manage changes over time, track the evolution of workflows, and ensure reproducibility of results. Implementing a versioning system will significantly improve the maintainability and reliability of Indexify workflows.

Current Limitations

  1. In indexify/functions_sdk/graph.py, the Graph class doesn't have any version information:
class Graph:
    def __init__(
        self, name: str, start_node: IndexifyFunction, description: Optional[str] = None
    ):
        self.name = name
        self.description = description
        self.nodes: Dict[str, Union[IndexifyFunction, IndexifyRouter]] = {}
        # ...
  1. The indexify_function decorator in indexify/functions_sdk/indexify_functions.py doesn't include version information:
def indexify_function(
    name: Optional[str] = None,
    description: Optional[str] = "",
    image: Optional[Image] = DEFAULT_IMAGE,
    accumulate: Optional[Type[BaseModel]] = None,
    payload_encoder: Optional[str] = "cloudpickle",
    placement_constraints: List[PlacementConstraints] = [],
):
    # ...
  1. When registering a compute graph in indexify/remote_client.py, there's no version handling:
def register_compute_graph(self, graph: Graph):
    graph_metadata = graph.definition()
    serialized_code = graph.serialize()
    response = self._post(
        f"namespaces/{self.namespace}/compute_graphs",
        files={"code": serialized_code},
        data={"compute_graph": graph_metadata.model_dump_json(exclude_none=True)},
    )
    # ...

Benefits of Versioning

  1. Reproducibility: Ensure that workflows can be reproduced exactly, even as individual functions or the overall graph structure evolves.
  2. Change Tracking: Easily track changes to functions and graphs over time, facilitating debugging and auditing.
  3. Collaboration: Enable multiple team members to work on the same workflow without conflicts.
  4. Rollback Capability: Quickly revert to previous versions of functions or entire graphs if issues are discovered.
  5. A/B Testing: Compare different versions of workflows or functions to optimize performance.

Proposed Solution

  1. Add version information to the Graph class
  2. Modify the indexify_function decorator to include version information
  3. Update the register_compute_graph method to handle versioning
  4. Implement version comparison and management utilities
  5. Update the LocalClient and RemoteClient classes to support versioning operations
  6. Modify the Task class in indexify/executor/api_objects.py to include version information:
  7. Update all relevant tests to include version checks

PulkitMishra avatar Oct 01 '24 06:10 PulkitMishra

Versioning is done automatically when code changes by the server. Take a look at https://github.com/tensorlakeai/indexify/blob/main/python-sdk/tests/test_graph_update.py

diptanu avatar Oct 01 '24 07:10 diptanu

@diptanu i did see that but while the test_graph_update.py file does demonstrate a basic form of updating a graph, it doesn't provide a comprehensive versioning system as described in the issue.

Users don't have direct control over versioning through the SDK - which is okay and makes sense why its like that but its also a bit limiting since they can't specify version numbers. without explicit versioning, users might find it challenging to manage complex workflows or collaborate effectively, especially in larger teams.

More importantly there seems to be no way to roll back to previous versions, or manage multiple versions simultaneously as pointed out in the issue. SDK doesn't provide methods for users to query or inspect different versions of a graph or function.

PulkitMishra avatar Oct 02 '24 20:10 PulkitMishra