volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Support judgment on status of multi-containers

Open Hcryw opened this issue 9 months ago • 4 comments

What is the problem you're trying to solve

In https://github.com/volcano-sh/volcano/blob/master/pkg/controllers/job/job_controller_handler.go#L273-L274:

we have noticed that the comments mention that multi-containers adaption will be supported, but this feature has not been updated yet. Recently we encountered an issue that when multiple containers are running in a pod, if the main training container either succeeds or fails while the other container is still running, the task status does not transition to the final state. This happens because the referenced code only determines the status based on the Pod.Status.Phase, rather than on the individual container states.

Describe the solution you'd like

We hope that the code can support status determination based on status of multiple containers, rather than solely rely on the pod status to handle update logic.

Additional context

No response

Hcryw avatar Apr 02 '25 11:04 Hcryw

It is necessary to add multi-container support

hwdef avatar Apr 16 '25 06:04 hwdef

/cc

JesseStutler avatar May 06 '25 03:05 JesseStutler

Hi, to change the task status based on whether the main training container succeeds or fails, should we have a separate field to specify main containers? like this

spec:
  tasks:
  - replicas: 1
    template:
      spec:
        containers:
        - name: training
          image: training:latest
        - name: monitor
          image: monitor:latest
    primaryContainers:
    - training

Shivansh-yadav13 avatar May 21 '25 09:05 Shivansh-yadav13

latest: sidecar feature

Hcryw avatar May 23 '25 07:05 Hcryw