elyra
elyra copied to clipboard
create public interface for user-defined "generic" components
Generic components are powerful because the same component can be run both locally, and in a kubeflow/airflow pipeline. This makes it easier to develop iteratively by running the pipeline locally (rather than spamming your kubeflow/airflow cluster with jobs).
We can provide a public interface for people to define their own "generic" components in addition to the built-in ones we already have (Jupyter Notebook
, Python Script
, R Script
).
Users could implement their "generic" components by implementing a Python class with methods like:
-
run_on_local(self) -> OperationProcessor
: -
run_on_kubeflow(self) -> kfp.dsl.ContainerOp
-
run_on_airflow(self) -> ElyraAirflowOperation
-
Current code that generates the
airflow operation dict
fromGenericOperation
-
(NOTE: we should create an actual class like
ElyraAirflowOperation
rather than using a dictionary)
-
Current code that generates the
The available user-inputs (for generating the node-properties UI) could be defined by implementing "property" methods on this class.
We can then provide @xxxx
decorators for each type of UI input we have ("dropdown", "list", "checkbox", etc).
For example, @elyra.dropdown(options=["option_1","option_2"], default="option_1")
would display a dropdown and would pass parameters like selected_option
to the annotated method.
Here is a very rough implementation of a generic-component class with one dropdown input called greeting_text
that simply runs a print()
function:
import kfp
from elyra.pipeline.local.processor_local import OperationProcessor
from elyra.pipeline.pipeline import GenericOperation
class MyGenericComponent(ElyraGenericComponent):
def run_on_local(self) -> OperationProcessor:
# `CustomOperationProcessor` is a custom subclass of `OperationProcessor`
class CustomOperationProcessor(OperationProcessor):
def __init__(self, text_to_print: str):
self.text_to_print = text_to_print
super().__init__()
def process(self, operation: GenericOperation, elyra_run_name: str):
print(self.text_to_print)
operation_processor = CustomOperationProcessor(
text_to_print=self.greeting_text()
)
return operation_processor
def run_on_kubeflow(self) -> kfp.dsl.ContainerOp:
container_op_factory = kfp.components.create_component_from_func(
func=lambda text_to_print: print(text_to_print),
base_image='python:3.9'
)
container_op = container_op_factory(
text_to_print=self.greeting_text()
)
return container_op
def run_on_airflow(self) -> ElyraAirflowOperation:
# `ElyraAirflowOperation` is a class that replaces the current dictionary we use to pass
# the list of operations for the "airflow_template.jinja2" template
elyra_airflow_operation = ElyraAirflowOperation(
class_name="airflow.operators.python.PythonOperator",
component_params={"python_callable": f"lambda: print({self.greeting_text()})"}
)
return elyra_airflow_operation
@elyra.dropdown(display_name="Greeting Text", options=["morning", "night"], default="morning")
def greeting_text(self, selected_option: str) -> str:
if selected_option == "morning":
return "Good morning, World!"
elif selected_option == "night":
return "Good night, World!"
else:
assert False
@akchinSTC @ptitzler any thoughts on if the above proposal is acceptable?
I think this is a very useful feature and will really set Elyra apart as a "generic" abstraction for pipelines.
@akchinSTC I have added this to the 4.0.0 milestone.
A public interface for "generic components" is a very valuable feature that no other pipeline tool has, adding it would make Elyra a powerful high-level abstraction above Airflow, Kubeflow and Local-Python.
This is NOT to say that we must use the specific proposal above, just that we should consider how best to achieve user-provided "generic components" for the 4.0.0 release.
What would be a concrete example of a "bring your own generic component" that can't be exposed as either a script or a notebook?
The issue is that runtimes are an extension point, and we have already seen a few runtime implementations being done by users, and the "run_on_xxx" won't be very scalable.
Also, generic components will have to reinvent the new KFP APIs, and we might go away from it.