Support judgment on status of multi-containers
What is the problem you're trying to solve
In https://github.com/volcano-sh/volcano/blob/master/pkg/controllers/job/job_controller_handler.go#L273-L274:
we have noticed that the comments mention that multi-containers adaption will be supported, but this feature has not been updated yet. Recently we encountered an issue that when multiple containers are running in a pod, if the main training container either succeeds or fails while the other container is still running, the task status does not transition to the final state. This happens because the referenced code only determines the status based on the Pod.Status.Phase, rather than on the individual container states.
Describe the solution you'd like
We hope that the code can support status determination based on status of multiple containers, rather than solely rely on the pod status to handle update logic.
Additional context
No response
It is necessary to add multi-container support
/cc
Hi, to change the task status based on whether the main training container succeeds or fails, should we have a separate field to specify main containers? like this
spec:
tasks:
- replicas: 1
template:
spec:
containers:
- name: training
image: training:latest
- name: monitor
image: monitor:latest
primaryContainers:
- training
latest: sidecar feature