codeflare-sdk icon indicating copy to clipboard operation
codeflare-sdk copied to clipboard

feature: batch ray job submission

Open asyoussef opened this issue 2 years ago • 2 comments

Today, to submit a ray job via the sdk the typical steps are: Cluster.up() Cluster.wait_ready() DDPJobDefinition.submit() Cluster.down()

The desired feature is to have one macro operation that combines the above steps, so that when the ray cluster is ready, which could be potentially hours later, the client does not have to be connected to submit the ray job. Basically submit ray cluster request + ray job definition simultaneously, in a fire and forget manner.

As an added value, this scenario is also useful when the ray cluster is obtained on a different OCP cluster. In that case, if the appwrapper is self sufficient and includes ray job definition which is passed to the ray cluster at time of its creation, there will be no need to return back a ray cluster dashboard route to the submission OCP cluster, which may be tricky.

asyoussef avatar Aug 14 '23 16:08 asyoussef

fyi @MichaelClifford @Maxusmusti

anishasthana avatar Aug 15 '23 19:08 anishasthana

Does it mean to support Ray Jobs by CodeFlare?

roytman avatar Sep 16 '23 07:09 roytman