flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Feature] [Optimization] Coalesce steps into a single container execution

Open kumare3 opened this issue 3 years ago • 1 comments

Motivation: Why do you think this is important? FlytePropeller schedules each task as a new container instantiation today. All tasks are not alike, but for tasks like - simple container tasks, it is possible to run subsequent container steps on the same node, and this avoid the penalty of storing and hydrating data again using a backend store. This should be available as an optimization without any code changes for the user and the execution graph should still look the same.

Goal: What should the final outcome look like, ideally? As a user when I write the graph, I do not think about writing data to object store and then reading it back. I just expect that intermediate datasets (between subsequent steps) is stored durably. As a platform owner I would like to avoid the round-tripping of data between subsequent steps. This will improve the performance greatly and would reduce the number of pods scheduled on K8s.

Describe alternatives you've considered The upcoming feature of intra-task checkpointing should make it possible for users to create intra task checkpoints, which would effectively resolve the above problem. But in this scenario, the onus of managing the state and resuming falls on the user. It would be ideal if this is abstracted from the user such that, it is simply handled by the platform.

Flyte component

  • [x] Overall
  • [ ] Flyte Setup and Installation scripts
  • [ ] Flyte Documentation
  • [ ] Flyte communication (slack/email etc)
  • [ ] FlytePropeller
  • [ ] FlyteIDL (Flyte specification language)
  • [ ] Flytekit (Python SDK)
  • [ ] FlyteAdmin (Control Plane service)
  • [ ] FlytePlugins
  • [ ] DataCatalog
  • [ ] FlyteStdlib (common libraries)
  • [ ] FlyteConsole (UI)
  • [ ] Other

Additional context this has the potential of greatly speeding up the performance of many linear and simple dags and make writing multi-step workflows - performant and thus more desirable for the users.

Is this a blocker for you to adopt Flyte NA

kumare3 avatar Dec 21 '20 04:12 kumare3

This issue should get a spec-first.

kumare3 avatar Dec 21 '20 04:12 kumare3

@kumare3 I am interested to know if there has been any progress on this?

I think that if Flyte manages to break the "each task is a container" idea, it will really differentiate Flyte from other workflow orchestration tools (which have all gone fairly hard towards making each task a container in recent years, with the possible exception of GitHub Workflows, which run every step/task within each job in the same worker).

thesuperzapper avatar Dec 13 '22 22:12 thesuperzapper

@thesuperzapper sadly no progress yet. We do have a design but no rfc. Let's hope next year. Definitely join slack and let's have a chat. There are containerless backed plugins already in Flyte- check that out

kumare3 avatar Dec 14 '22 03:12 kumare3

@kumare3 I'm curious to know if there is any update on this ticket? which is very important for our current project

mahanh avatar Dec 14 '23 00:12 mahanh

@mahanh we have this working in prototye. please ping me on slack, we would love to understand more.

kumare3 avatar Dec 14 '23 01:12 kumare3

To describe how we would use this feature:

We often have 20+ tasks in one sub workflow that all either take like 10-15 minutes or 1 second (if skipped) and they are really just running a single executable (but need to run it for the skipping check logic exists in C#). Coalescing would speed up the skip version of events a ton, since starting a single process and exiting is miles faster than getting k8s to schedule a new pod.

We might be able to use some FlyteAgent with long running "Sync" tasks or something (seems like we could just do sp.POpen or something and block), but that seems like an over engineered solution for what should probably be a platform feature. Essentially a reusable "object pool" but for pods.

Also we often need small "bash-style" scripts to quickly move a file, rename one, grab a small request stuff like that. Which only really requires plain python anyway, those would also be great to be able to coalesce.

EraYaN avatar Mar 13 '24 09:03 EraYaN