metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Metaflow running from the code instead from cli

Open kgrvamsi opened this issue 4 years ago • 9 comments

Hello,

I have a use case where i want to run it as part of the current code base which can run something like. "python filename.py run" which we run from the cli.

Checking the metaflow.client.Run but have no proper documentation around so confused on how to implement this.

kgrvamsi avatar Jan 27 '21 19:01 kgrvamsi

@kgrvamsi You can use subprocess.run to launch a flow from within a python script. #116

savingoyal avatar Jan 27 '21 19:01 savingoyal

@savingoyal what do you recommend on this approach

from metaflow import Flow, FlowSpec, step, get_metadata
from metaflow.client import Run

class BasicFlow(FlowSpec):
  @step
  def start(self):
    """
    This is the 'start' step. All flows must have a step named 'start'.
    """
    print("Metaflow says: Hi!")
    self.next(self.end)

  @step
  def end(self):
    """
    This is the 'end' step. All flows must have an 'end' step.
    """
    print("My flow is all done.")

if __name__ == '__main__':
  run = Flow('BasicFlow').latest_run
  k = run.steps()
  for data in k:
         print(data.task.stdout)

kgrvamsi avatar Jan 31 '21 05:01 kgrvamsi

This is a very valid point. I have been looking at using this library as a base for our scientific methods to execute on. The only thing stopping me from using this library is the lack of support for being able to execute the pipelines programmatically (from code, and not from cli).

I have seen similar issues raised here already, such as #309 and a few similar ones.

As in my use case, I would see the pipelines being loaded and executed dynamically in Celery workers. Using subprocess is not a solution for production code 👎 I am afraid.

Are there any updates on the progress of this feature implementation, please?

den4uk avatar Apr 08 '21 20:04 den4uk

@den4uk We are actively thinking about programmatic local execution of Metaflow flows. I will update the thread as we start making further progress with the design memo. Can you elaborate on why subprocess wouldn't work for you?

savingoyal avatar Apr 08 '21 20:04 savingoyal

Let's be frank, executing python files, in python, using subprocess is a hacky way. That's not what anybody would do in production code.

In reality, you would want to do the following that you can't achieve with subprocess:

  • Thread locking, so being able to track the request via correlation id's from the source to execution and follow-on stages.
  • Passing context objects to the pipeline (such as connections, logging).
  • Handling errors/exceptions when running in a non-managed/detached manner would be far too troublesome.

There are plenty more reasons, I believe, but just a few that came to my mind.

den4uk avatar Apr 09 '21 07:04 den4uk

@den4uk Metaflow uses subprocess behind the scenes and it is very likely that the programmatic execution capability will be a wrapper on top of subprocess.

savingoyal avatar Apr 12 '21 04:04 savingoyal

@savingoyal do we have any update on this by calling metaflow programatically instead of using subprocess way?

kgrvamsi avatar Aug 01 '21 17:08 kgrvamsi

Here is the memo - https://docs.google.com/document/d/1HJW9TH6lHEUojqDTzfgJZJXgjg3xZr43-p8_vjYJM1c/edit#heading=h.thnxk5pwrpsa

savingoyal avatar Aug 31 '22 15:08 savingoyal

"We currently have hundreds of tasks on our platform that require bulk updates. Managing them solely through commands makes it difficult to integrate with other systems. Is it possible to address this issue in the next version? Introducing a class and directly invoking a specific method within the class to initiate a Flow would be a significant enhancement."

jixianyihao avatar Aug 27 '23 12:08 jixianyihao

please see - https://outerbounds.com/blog/metaflow-in-notebooks-and-scripts/

savingoyal avatar Jun 17 '24 19:06 savingoyal