Jiaxin Shan

Results 193 comments of Jiaxin Shan

@alculquicondor Has community discussed tradeoffs about job vs pod for launcher, statefulsets vs plain pods for workers?

I think this is a valid use case. From user's perspective, user should use training operators with Kubeflow pipeline easily. The challenge to support native artifact is because all jobs...

/hold Looks like there's way to support in MPI-Operator. If that's the case, I think the preferred option is to check possibility and efforts we need to make in mpi-operator...

@ryantd Do you have an idea about next steps? Do you think support this case in mph-operator is a reasonable direction to go? If not, any technical blockers?

Nice proposal. I will leave some comments tomorrow.

The proposal overall looks good to me. A thing I am not clear is the metrics HPA part. I feel it's better to decouple with the job controller but I...

@Bobgy Thanks for the update. Is it project owner's responsibility or WG's responsibility to make sure license files are included in all the images? Do you know if there's way...

@kuizhiqing What's your plan on kubeflow/paddle-operator? Seems you still use PaddleFlow/paddle-operator as main dev repo?

@kuizhiqing Please have a look at tf-operator code base and see if that's possible to make it easily. We heavily reuse codes from kubeflow/common

In Bytedance, we mount ray logs folder `/tmp/ray` to a fixed `hostPath` `/opt/tiger/containers` and we do configure filebeat rules to collect all Ray logs. In this case, our log daemon...