metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Add support for Kueue.

Open shrinandj opened this issue 1 year ago • 3 comments

This commit adds support for using Kueue to submit jobs/pods into Kubernetes. There are two config options:

  • KUEUE_ENABLED: set to True/False
  • KUEUE_LOCALQUEUE_NAME: set to the name of the localqueue configured with Kueue. See this for details

The config options can be set in the main metaflow config or via the @kubernetes decorator.

Testing Done:

  • Verified that specifying kueue config options in Metaflow config (~/.metaflowconfig/json) works as expected.

  • Verified that specifying kueue config options in @kubernetes works as expected

  • Verified that @kubernetes options take precedence over the global config

    • If the global KUEUE_ENABLED config is True, but locally set to False for a particular step, the step does not run with Kueue.
  • Verified that the kueue labels and annotations are set correctly and kueue actually runs the jobs.

  • Verified that if kueue is configured to manage "pod", Metaflow create argo-workflow pods are scheduled by kueue.

  • Verified that the default behavior is to not use Kueue and everything works correctly as before (jobs and argo-workflows)

shrinandj avatar Feb 26 '24 20:02 shrinandj

Mergeable anytime from my end -- no impact on core.

romain-intel avatar Feb 29 '24 08:02 romain-intel

I'm interested in this PR - is it actually going to happen or dead in the water?

Hi there, I would be interested as well

DonIvanCorleone avatar Jul 31 '24 07:07 DonIvanCorleone