dask-kubernetes icon indicating copy to clipboard operation
dask-kubernetes copied to clipboard

Add status to `DaskJob` CRD

Open bstadlbauer opened this issue 1 year ago • 1 comments

Would it be possible to track the status of a job in the toplevel DaskJob CR? This would have the advantage of hiding the "implementation details" of the job from a user. I.e. instead of having to know about the state of a job-runner pod that gets created as part of a DaskJob, one could only look at the DaskJob resource to get that information.

A particular use case here would be the Flyte plugin I am currently working on. Most Flyte (backend) plugins work by creating a k8s resource which they continuously poll for updates. The way the plugin machinery is implemented does not allow to reach out to arbitrary k8s resources as the initially created resource is updated and passed into the plugin from the outside. The exact interface definition is here, where the resource argument in Plugin.GetTaskPhase() would correspond to the DaskJob CR created in Plugin.BuildResource()

Similar Job objects do also set a Status field on their Custom Resources, e.g.:

Happy to contribute something in case we can align on a format of the status. My initial proposal would be:

type JobStatus string

const (
	JobStatusPending   JobStatus = "PENDING"
	JobStatusRunning   JobStatus = "RUNNING"
	JobStatusStopped   JobStatus = "STOPPED"
	JobStatusSucceeded JobStatus = "SUCCEEDED"
	JobStatusFailed    JobStatus = "FAILED"
)

type DaskJobStatus struct {
	DaskClusterName             string        `json:"daskClusterName,omitempty"`
	JobStatus                   JobStatus     `json:"jobStatus,omitempty"`
	StartTime                   *metav1.Time  `json:"startTime,omitempty"`
	EndTime                     *metav1.Time  `json:"endTime,omitempty"`
}

This comment has a more detailed explanation on why the current state is blocking the Flyte plugin development.

cc @hamersaw

bstadlbauer avatar Oct 10 '22 19:10 bstadlbauer

Yeah this would be great. We already do this for the DaskCluster resource so adding it for the DaskJob makes sense too.

If you have an interest in contributing it that would be fantastic.

jacobtomlinson avatar Oct 11 '22 12:10 jacobtomlinson