metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Composable Flows and Steps

Open talebzeghmi opened this issue 5 years ago • 8 comments

Large ML projects spanning teams reuse pipelines and models (ex: ensembles, feature engineering, etc).

There are two aspects of reuse:

  1. Reuse a whole Flow, to be able to compose a Flow of other Flows.
  2. Reuse a step (imagine it to be a feature engineering step). Steps currently do not have in parameters and return values making reuse more difficult.

A Use Case:

  • A large modeling project consistent of logical steps. (ex: feature engineering, imputation, models, stack models, meta model, smoothing, validation). Each of those steps may be a flow, and each would reuse feature engineering transforms from other teams.
  • It may be cumbersome to create a Flow for every feature engineering transform, rather than simple functions (steps?) that are easily reused.
  • Each logical step could be developed by its own team of applied scientists.

related: https://github.com/Netflix/metaflow/issues/144

talebzeghmi avatar Jul 02 '20 22:07 talebzeghmi

We have been thinking about (1) [as graph composition] and hopefully will publish more details on the thoughts we have about it. cc @tuulos For (2) - you could still get the sharing esp. for feature engineering transform as a library of functions (instead of steps); that can just be imported within your step. Some of our team internal to Netflix employ this route for sharing such business logic.

Also, for relatively common collection of transformations you could still use (1) if you want to even reduce the step boilerplate from being repeated.

crk-codaio avatar Jul 02 '20 22:07 crk-codaio

@talebzeghmi Thank you for opening this issue! Your issue has articulated some of the exact metaflow architectural questions that our team has been having around productionizing/pipelining metaflow ... especially around the reusability of feature engineering code within multiple flows.

I don't want to have to copy and paste scikit-learn Transformer code to each new modeling flow especially when there is a lot of boilerplate/utility code that I've written around:

  1. leveraging pandas to protect against differing columns being passed in.
  2. pulling in a tagged 'production' model from a Run that is then reloaded for just the data 'transform' and not the 'fit' as well.

@seeravikiran Thanks for some of the recommendations regarding structuring and code reusability to address some of items presented in this issue. I will continue to investigate what that would look like on our end. In the meantime, I would like to point you to this post made on the metaflow community page that actually proposes a pretty interesting idea to the issue. I'm curious as to your thoughts on this (or something like this).

dpatschke avatar Aug 07 '20 16:08 dpatschke

As @seeravikiran pointed out above, we have plans for graph composition. Meanwhile, this form of subclassing is supported https://github.com/Netflix/metaflow/issues/144#issuecomment-592245062

tuulos avatar Aug 07 '20 16:08 tuulos

@tuulos Thanks for the response and the reference. This is extremely helpful and greatly appreciated!

dpatschke avatar Aug 07 '20 16:08 dpatschke

@tuulos, would you be able to share an RFC kind of document on how Metaflow would support composition? In this way we can give feedback from our Applied Scientists on it's usability, before the code is written.

thank you!

talebzeghmi avatar Sep 01 '20 20:09 talebzeghmi

@talebzeghmi yep, I have been writing a doc that I should be able to share this month. I will ping you when it is available. Thanks for your patience :)

tuulos avatar Sep 02 '20 17:09 tuulos

Hello @tuulos , any news since this doc you've been writing in 2020 regarding metaflow composable flows?

PertuyF avatar Oct 23 '23 08:10 PertuyF

Hi @tuulos,

is there any progress with respect to this topic? Would be extremely helpful for our use business case we are having right now :) Any feedback appreciated.

Cheers

DonIvanCorleone avatar Feb 26 '24 07:02 DonIvanCorleone