metricflow icon indicating copy to clipboard operation
metricflow copied to clipboard

Dependency injection mechanisms

Open mazinesy opened this issue 2 years ago • 1 comments

Describe the Feature Currently, a lot of visitors are instantiated inside the init function without the ability to swap implementation. So we are not able to leverage the strength of the visitor pattern. For instance:

class MetricFlowEngine(AbstractMetricFlowEngine):
    def __init__(...):
        ... # some code

        # hard coded here without the ability to inject it
        self._to_sql_query_plan_converter = DataflowToSqlQueryPlanConverter[DataSourceDataSet](
            column_association_resolver=self._column_association_resolver,
            semantic_model=self._semantic_model,
            time_spine_source=self._time_spine_source,
        )
        self._to_execution_plan_converter = DataflowToExecutionPlanConverter[DataSourceDataSet](
            sql_plan_converter=self._to_sql_query_plan_converter,
            sql_plan_renderer=self._sql_client.sql_engine_attributes.sql_query_plan_renderer,
            sql_client=sql_client,
        )

Enabling dependency injection would allow the use of different visitor concretions. For instance, we could have a DenormalizedToSqlQueryPlanConverter injected instead of the usual DataflowToSqlQueryPlanConverter.

This also implies that some class must rely on the proper abstraction. For instance,

class DataflowToExecutionPlanConverter(Generic[SqlDataSetT], SinkNodeVisitor[SqlDataSetT, ExecutionPlan]):
    """Converts a dataflow plan to an execution plan"""

    def __init__(
        self,
        # uses the abstraction here instead of the `DataflowToSqlQueryPlanConverter` concretion
        sql_plan_converter: DataflowPlanNodeVisitor[SqlDataSetT, SqlDataSet],
        sql_plan_renderer: SqlQueryPlanRenderer,
        sql_client: SqlClient,
        output_column_name_overrides: Tuple[OutputColumnNameOverride, ...] = (),
    ) -> None:

Would you like to contribute? Yes

Anything Else? Add any other context or screenshots about the feature request here.

mazinesy avatar Sep 27 '22 18:09 mazinesy

Yes! In general I am a fan of this model, it's what I'm used to as well, although I wouldn't use the base classes as types here - I'd rather define a set of protocols with entry methods that make it obvious what our output expectation is for each branch of visitors. For example, we could make DataflowToSqlQueryPlanConverter into a Protocol, perhaps one extending a more general DataflowPlanNodeVisitor Protocol, and our current class would then be an implementation called something like MetricQueryDataflowToSqlQueryPlanConverter. Then the sql_plan_converter in your updated __init__ would have an entry point API that takes a set of DataflowPlanNodes and returns a set of SqlQueryPlanNodes, instead of returning a generic object. That way it doesn't even need to use a visitor pattern under the hood, although that is our current convention.

More broadly, we'd benefit from increasing our use of Protocols and other general type specifications, and relying on those instead of concrete implementations embedded in an inheritance hierarchy. The visitors are, by fundamental requirement of the design pattern, typed in that way but there are a lot of things that get rolled up into the visitor logic that could be disentangled, modularized, and conformed into a type specification in its own right. Our join resolution, for example, could probably be forked out and better organized with this approach to typing.

To sum up, I like it. Let's do it. Happy to pair up a bit early on if you think that'd help!

tlento avatar Sep 29 '22 21:09 tlento