metricflow
metricflow copied to clipboard
Dependency injection mechanisms
Describe the Feature
Currently, a lot of visitors
are instantiated inside the init
function without the ability to swap implementation. So we are not able to leverage the strength of the visitor
pattern. For instance:
class MetricFlowEngine(AbstractMetricFlowEngine):
def __init__(...):
... # some code
# hard coded here without the ability to inject it
self._to_sql_query_plan_converter = DataflowToSqlQueryPlanConverter[DataSourceDataSet](
column_association_resolver=self._column_association_resolver,
semantic_model=self._semantic_model,
time_spine_source=self._time_spine_source,
)
self._to_execution_plan_converter = DataflowToExecutionPlanConverter[DataSourceDataSet](
sql_plan_converter=self._to_sql_query_plan_converter,
sql_plan_renderer=self._sql_client.sql_engine_attributes.sql_query_plan_renderer,
sql_client=sql_client,
)
Enabling dependency injection would allow the use of different visitor concretions. For instance, we could have a DenormalizedToSqlQueryPlanConverter
injected instead of the usual DataflowToSqlQueryPlanConverter
.
This also implies that some class must rely on the proper abstraction. For instance,
class DataflowToExecutionPlanConverter(Generic[SqlDataSetT], SinkNodeVisitor[SqlDataSetT, ExecutionPlan]):
"""Converts a dataflow plan to an execution plan"""
def __init__(
self,
# uses the abstraction here instead of the `DataflowToSqlQueryPlanConverter` concretion
sql_plan_converter: DataflowPlanNodeVisitor[SqlDataSetT, SqlDataSet],
sql_plan_renderer: SqlQueryPlanRenderer,
sql_client: SqlClient,
output_column_name_overrides: Tuple[OutputColumnNameOverride, ...] = (),
) -> None:
Would you like to contribute? Yes
Anything Else? Add any other context or screenshots about the feature request here.
Yes! In general I am a fan of this model, it's what I'm used to as well, although I wouldn't use the base classes as types here - I'd rather define a set of protocols with entry methods that make it obvious what our output expectation is for each branch of visitors. For example, we could make DataflowToSqlQueryPlanConverter
into a Protocol, perhaps one extending a more general DataflowPlanNodeVisitor
Protocol, and our current class would then be an implementation called something like MetricQueryDataflowToSqlQueryPlanConverter
. Then the sql_plan_converter
in your updated __init__
would have an entry point API that takes a set of DataflowPlanNodes and returns a set of SqlQueryPlanNodes, instead of returning a generic object. That way it doesn't even need to use a visitor pattern under the hood, although that is our current convention.
More broadly, we'd benefit from increasing our use of Protocols and other general type specifications, and relying on those instead of concrete implementations embedded in an inheritance hierarchy. The visitors are, by fundamental requirement of the design pattern, typed in that way but there are a lot of things that get rolled up into the visitor logic that could be disentangled, modularized, and conformed into a type specification in its own right. Our join resolution, for example, could probably be forked out and better organized with this approach to typing.
To sum up, I like it. Let's do it. Happy to pair up a bit early on if you think that'd help!