metaflow
metaflow copied to clipboard
Metaflow running from the code instead from cli
Hello,
I have a use case where i want to run it as part of the current code base which can run something like. "python filename.py run" which we run from the cli.
Checking the metaflow.client.Run but have no proper documentation around so confused on how to implement this.
@kgrvamsi You can use subprocess.run
to launch a flow from within a python script. #116
@savingoyal what do you recommend on this approach
from metaflow import Flow, FlowSpec, step, get_metadata
from metaflow.client import Run
class BasicFlow(FlowSpec):
@step
def start(self):
"""
This is the 'start' step. All flows must have a step named 'start'.
"""
print("Metaflow says: Hi!")
self.next(self.end)
@step
def end(self):
"""
This is the 'end' step. All flows must have an 'end' step.
"""
print("My flow is all done.")
if __name__ == '__main__':
run = Flow('BasicFlow').latest_run
k = run.steps()
for data in k:
print(data.task.stdout)
This is a very valid point. I have been looking at using this library as a base for our scientific methods to execute on. The only thing stopping me from using this library is the lack of support for being able to execute the pipelines programmatically (from code, and not from cli).
I have seen similar issues raised here already, such as #309 and a few similar ones.
As in my use case, I would see the pipelines being loaded and executed dynamically in Celery workers. Using subprocess
is not a solution for production code 👎 I am afraid.
Are there any updates on the progress of this feature implementation, please?
@den4uk We are actively thinking about programmatic local execution of Metaflow flows. I will update the thread as we start making further progress with the design memo. Can you elaborate on why subprocess
wouldn't work for you?
Let's be frank, executing python files, in python, using subprocess
is a hacky way.
That's not what anybody would do in production code.
In reality, you would want to do the following that you can't achieve with subprocess
:
- Thread locking, so being able to track the request via correlation id's from the source to execution and follow-on stages.
- Passing context objects to the pipeline (such as connections, logging).
- Handling errors/exceptions when running in a non-managed/detached manner would be far too troublesome.
There are plenty more reasons, I believe, but just a few that came to my mind.
@den4uk Metaflow uses subprocess
behind the scenes and it is very likely that the programmatic execution capability will be a wrapper on top of subprocess
.
@savingoyal do we have any update on this by calling metaflow programatically instead of using subprocess way?
Here is the memo - https://docs.google.com/document/d/1HJW9TH6lHEUojqDTzfgJZJXgjg3xZr43-p8_vjYJM1c/edit#heading=h.thnxk5pwrpsa
"We currently have hundreds of tasks on our platform that require bulk updates. Managing them solely through commands makes it difficult to integrate with other systems. Is it possible to address this issue in the next version? Introducing a class and directly invoking a specific method within the class to initiate a Flow would be a significant enhancement."
please see - https://outerbounds.com/blog/metaflow-in-notebooks-and-scripts/