indexify
indexify copied to clipboard
Improve Scalability and Efficiency of Python SDK
Improve Scalability and Efficiency with Graph Optimization and Streaming Support
Issue Description
The current implementation of Indexify's Python SDK may face challenges with very large graphs or high-throughput scenarios on SDK side. We need to add support for streaming and parallel processing to improve execution efficiency and handle continuous data flows more effectively.
Current Limitations
-
Large Graph Handling: The
Graphclass inindexify/functions_sdk/graph.pyloads all nodes and edges into memory, which may not be efficient for very large graphs. -
Synchronous Execution: The
_runmethod inLocalClient(indexify/local_client.py) processes nodes sequentially, which may not be optimal for large graphs or high-throughput scenarios. -
Lack of Streaming Support: The current implementation doesn't have native support for processing continuous data streams efficiently.
Examples from Codebase
-
Graph Loading (indexify/functions_sdk/graph.py):
class Graph: def __init__(self, name: str, start_node: IndexifyFunction, description: Optional[str] = None): self.nodes: Dict[str, Union[IndexifyFunction, IndexifyRouter]] = {} self.routers: Dict[str, List[str]] = defaultdict(list) self.edges: Dict[str, List[str]] = defaultdict(list)This in-memory representation may not scale well for very large graphs.
-
Synchronous Execution (indexify/local_client.py):
def _run(self, g: Graph, initial_input: bytes, outputs: Dict[str, List[bytes]]): queue = deque([(g._start_node, initial_input)]) while queue: node_name, input = queue.popleft() # Process node...This sequential processing may become a bottleneck for large graphs or high-throughput scenarios.
-
Lack of Streaming Support: There's currently no dedicated mechanism for handling continuous data streams efficiently.
Proposed Solution
-
Graph Optimization:
- Implement lazy loading of graph nodes and edges.
-
Improved Execution Engine:
- Refactor the execution engine to support parallel processing of independent nodes.
- Implement a work-stealing scheduler for better load balancing across available resources.
-
Streaming Support:
- Introduce a new
StreamProcessorclass that can handle continuous data inputs. - Modify the
Graphclass to support infinite loops for stream processing. - Implement windowing operations for aggregating streaming data.
- Introduce a new
-
Testing:
- Develop comprehensive test suite for new features.
Additional Notes
- Very vague idea but we can explore adding a graph analysis step to identify opportunities for optimization (e.g., combining compatible nodes, identifying parallelizable sections).
- Performance benchmarks should be established before and after these changes to quantify the improvements.
Graph Optimization:
Implement lazy loading of graph nodes and edges.
Are you thinking map operations?
We don't create as many edges as the number of Tasks like Airflow does. Indexify Server creates tasks for map operations based on the number of outputs from a function. The python side doesn't create these objects in memory. It's simply code that gets executed
Improved Execution Engine:
Refactor the execution engine to support parallel processing of independent nodes. Implement a work-stealing scheduler for better load balancing across available resources.
All these already happen in Indexify server
Streaming Support:
We would be creating something like this - @indexify_genertor def foo(..) -> Generator[T] yield T
Testing:
Develop comprehensive test suite for new features.
Look at test_graph_behaviors.py
@diptanu yeah you're right, working with indexify sdk today and skimming the rust code makes me realize we can chuck Graph Optimization and Improved Execution Engine. those make no sense. I still think that Streaming Support can be helpful and yeah @indexify_generator decorator that you suggested seems like a better approach. It's simple, requires less modification to existing Indexify architecture, the api will be intuitive given the decorator approach is consistent with existing Indexify function decorators