[FEATURE]: Allow fetching from Repository by name or type
Contact Details [Optional]
Describe the feature you'd like
This is tangentially related to #726
Currently, when you want to fetch items from a Repository you need to use strings to identify the keys. I would like to be able to use the pipeline and step objects to fetch views from the repository.
In general, we'd like to limit how much we rely on string naming agreement. This allows refactoring tools to do a better job but also means you never have to copy paste strings since your editor can auto-complete pipeline and step definition names.
e.g.
@step
def my_step():
...
@pipeline
def my_pipeline(my_step):
...
p = my_pipeline(my_step())
p.run()
repo = Repository()
I'd like the following to work
pipeline_view = repo.get_pipeline(p)
pipeline_view = repo.get_pipeline(my_pipeline)
pipeline_view = repo.get_pipeline("my_pipeline")
step_view = pv.steps[my_step]
s = p.steps[...]
step_view = pv.steps[s]
Is your feature request related to a problem?
Sort of, we're anticipating maintenance problems as we scale and step and pipeline names might change over time but we wouldn't be able to catch all usages in notebooks. In our own codebase, we'd like to enforce never using string names in a custom flake8 code style plugin.
Imaging we've refactored myproj.pipelines.my_fancy_pipeline.my_fancy_pipeline -> myproj.pipelines.my_awesome_pipeline import my_awesome_pipeline
the following code would fail with an import error:
from myproj.pipelines.my_fancy_pipeline import my_fancy_pipeline
# ^^^ module not found
repo = Repository()
repo.get_pipelines(my_fancy_pipeline)
How do you solve your current problem with the current status-quo of ZenML?
We've written the following utility code
PipelineName = Union[str, BasePipeline, BasePipelineMeta]
@beta
def get_last_run(pipeline: PipelineName) -> PipelineRunView:
"""Hacky convenience method to fetch the last run of a pipeline.
Notes
-----
This is inherently racy, but it's currently the only way to fetch the pipeline run details.
This approach is also what's currently suggested in the docs.
See Also
--------
https://github.com/zenml-io/zenml/issues/726
"""
repo = Repository()
return repo.get_pipeline(pipeline_name=to_pipeline_name(pipeline)).runs[-1]
@beta
def to_pipeline_name(pipeline: PipelineName) -> str:
"""Get a pipeline's name.
This also supports custom named pipelines via `@pipeline(name="blah")`
"""
if isinstance(pipeline, str):
# treat `pipeline` as a pipeline name string
return pipeline
if isinstance(pipeline, BasePipeline):
# `pipeline` is a connected instance, use its `name` property
return pipeline.name
if isinstance(pipeline, BasePipelineMeta):
# `pipeline` is a decorated function.
# This code duplicates code in zenml.pipelines.BasePipeline.__init__
return pipeline.__name__
else:
raise TypeError(type(pipeline))
and pytest
@pipeline
def my_pipeline():
...
@pipeline(name="blah blah")
def my_custom_named_pipeline():
...
def test_to_pipeline_name():
p = my_pipeline()
name = "my_pipeline"
assert_that(to_pipeline_name(name)).is_equal_to(name)
assert_that(to_pipeline_name(my_pipeline)).is_equal_to(name)
assert_that(to_pipeline_name(p)).is_equal_to(name)
def test_to_pipeline_name__with_custom_name():
p = my_custom_named_pipeline()
name = "blah blah"
assert_that(to_pipeline_name(name)).is_equal_to(name)
assert_that(to_pipeline_name(my_custom_named_pipeline)).is_equal_to(name)
assert_that(to_pipeline_name(p)).is_equal_to(name)
def test_to_pipeline_name__with_unknown_type():
with pytest.raises(TypeError) as e:
# noinspection PyTypeChecker
to_pipeline_name([])
assert_that(e.value.args[0]).is_equal_to(list)
Any other comments?
No response
Hi @strangemonad, I'll have a look at this one and get back to you if any questions arise.
I have made some experimental changes to the repository, you can find the draft here - this is work in progress and I hope to get it into our next release in two weeks. Feel free to take a testdrive and let me know if its the behaviour you were looking for.