metaflow
metaflow copied to clipboard
Support for Airflow and Kubernetes
An open-source version of issue #2 -- would love to be able to have Metaflow plugins that support Airflow and Kubernetes!
We currently deploy our machine learning models to Kubernetes as restful API-wrapped microservices, then create Airflow dags to orchestrate and schedule the execution of all the model components.
Admittedly not entirely familiar with what all Metaflow offers just yet, but would love to see seamless integrations with these other awesome open-source tools!
@JoshZastrow I thought that MetaFlow could integrate with kubeflow, which is the machine learning toolkit for Kubernetes.
@JoshZastrow Thanks for opening the issue! Yes, we are evaluating and prioritizing our roadmap currently.
And what about using argo instead of airflow ? (https://github.com/argoproj/argo) Can it be included in this issue or should it be another one ?
@nlaille Let's track that as a separate issue so that people can vote and weigh in with their opinions.
IMHO this issue is too broad. Let me separate the use of Airflow with and without Kubernetes. You probably don't need Metaflow if you're using Airflow with Kubernetes. You may need Metaflow as an Airflow executor and an Airflow operator if you're using Airflow without Kubernetes.
Admittedly not entirely familiar with what all Metaflow offers just yet
I love open source software and solutions including Airflow which I use, but I believe this issue should be closed unless the o.p. can substantiate what Metaflow would meaningfully add to the Airflow with Kubernetes combo.
Orchestration part could be cloud solutions like AWS Step Function or container based orchestration solutions like argo or other orchestrations like Airflow.
One reasonable option is to map metaflow DAG to step function/ARGO/Airflow DAG and execute remotely. Computing resources need to be changed correspondingly. Totally agree on @impredicative 's point, unless users have clear requirements, otherwise, it's not that meaningful to do this integration.
I would second @impredicative 's comment that this is probably too broad.
In particular, I think there's potential, independent value of having a plugin implementation of a k8s cli, compute environment, and decorator. Based on a quick scan, it doesn't seem like there's too much functionality there to implement -- just make a kube job definition, come up with an annotation scheme (probably can do something similar to what airflow does), and handle cleanup. Drop in some example RBAC templates and you're probably good to go.
I think it would probably be fine to stop at container/job orchestration, and leave things like cluster autoscaling to pointers to existing k8s docs and tools.
The scheduler (i.e Airflow or AWS Step Function or Argo) seems like a separate discussion that's out of scope of a question of Kubernetes.
Why not compile to Kubeflow Pipelines via an intermediate representation (IR) [1]?
[1] https://github.com/kubeflow/pipelines/issues/3703
@talebzeghmi Yes, an IR for KfP would be great. Is there an RFC for it? We are happy to contribute our thoughts.
For folks following this thread, we recently announced an equivalent support for AWS Step Functions. Here is an article with more details.
There are existing mechanisms for triggering workflows based on external events.
For clarity, what are these? What if I want to trigger it on a schedule like Airflow allows me to do?
For AWS Step Functions, we provide time-based triggers out of the box right now. You can very easily configure other triggers (say data availability in S3 using Amazon EventBridge).
@savingoyal is there any support for event-based triggers? (e.g. REST API)
@lucianoviola Yes, you can use AWS Event Bridge to do event-based triggering of Step Functions workflows.
https://github.com/Netflix/metaflow/issues/50#issuecomment-946254343 If you would like to try out and give feedback on our Kubernetes integration, please reach out at http://slack.outerbounds.co
#992 provides GA support for Kubernetes. https://github.com/outerbounds/metaflow/tree/airflow is tracking the Airflow integration on top of Kubernetes.
Kubernetes support was done via supporting the Argo-Workflows, great! https://github.com/Netflix/metaflow/pull/992 (Dispatch Metaflow flows to Argo Workflows)
This branch tracks the work for this issue.
#1256 adds formal support for Airflow in Metaflow. Docs & release announcement to follow soon!
https://outerbounds.com/blog/better-airflow-with-metaflow/