metaflow
metaflow copied to clipboard
Make it easier to migrate from Jupyter Notebook to Metaflow Steps
Most data scientist do their development in Jupyter Notebooks. Recommendation would be to use something like Kale to be able to tag cells in Jupyter Notebook and for them to be automatically turned into Metaflow steps to make the Notebook to Metaflow Steps process easier.
This idea was really taken from this article.
Thanks for the links @jimmycfa. We have been thinking about the topic for a long time. While technically grabbing code from cells is quite doable, it is less clear how one should version control the code, define a complex DAG, resume
executions etc.
Metaflow is definitely meant to be used in conjunction with notebooks. At Netflix, we use notebooks to inspect results using the Client API. We have also a @notebook decorator internally, which we are planning to open-source, which automatically executes notebooks when a Metaflow step finishes.
Long story short, this idea is on our radar. Stay tuned!
Hi! Has there been any further thinking on this? We're in a similar position to @jimmycfa and it would be really useful to have some way of more easily moving code from notebooks into Metaflow.
We currently have hundreds of tasks on our platform that require bulk updates. Managing them solely through commands makes it difficult to integrate with other systems. Is it possible to address this issue in the next version? Introducing a class and directly invoking a specific method within the class to initiate a Flow would be a significant enhancement.
@jixianyihao good feedback. There are various ways to do such updates today. Can you share more details about what you need to update: Parameters, decorators, business logic, or something else?
We provides the self-service analysis function. Users can write code in our jupyter notebook and release the code as dataAPI according to the specified format. The code is stored in NFS. However, this design does not consider the ETL process and ad hoc query process, Now, we want to integrate the FlowSpec of metaflow, let users write this workflow, and let us publish it. Problems encountered as a result:
-
A flow cannot be invoked and started like an API. Instead, it is directly started when a class method is invoked. FlowSpec provides a method: run(), so that we can directly instantiate and call this method;
-
When the card is used, user-defined parameters cannot be transferred. This is very important because the previous API allows users to transfer user-defined parameters and process results.
I'm a developer myself, and recently I've been working on the source code for metaflow