Rob Bell

Results 13 comments of Rob Bell

Hi folk, I've made a PR with a KEP for this feature - https://github.com/kubeflow/trainer/pull/2905. We'd love to start the discussion on how best to implement this feature.

Hi @juyterman1000, thanks for creating this PR. I think there's a few extra details that need working out to make sure this extra check is going to be robust -...

Naming ports isn't strictly necessary, but it can be quite nice when a container exposes a number of ports - you use a name in readiness/liveness probes and the service....

Hi @Goku2099, This issue is referring to the controller deployment and service manifests in `manifests/base/manager/manager.yaml` (and the equivalent manifests in the helm chart). There's some extra details on this PR...

> > Another proposed solution is as follows: Enable the KubeFlowCallBack to send status to the TrainJob directly via kubeapi. When the operator starts a new training job it creates...

Hey @andreyvelich! Thanks for chasing, and apols for the silence. > @astefanutti @robert-bell Do we need to update KEP based on our conversation with @astefanutti and @tenzen-y at KubeCon, so...

Hi folk, @astefanutti @andreyvelich @tenzen-y, I've updated the proposal with a new design based on the discussions from Kubecon, and looked to incorporate some of the ideas from the first...

Hey @andreyvelich @tenzen-y, I've updated the KEP based on our conversation from last Wednesday. Please could you take another look? The changes are in the last two commits but in...

Hey @andreyvelich @tenzen-y @astefanutti I've done another update. Main changes since last time - - updated the path and added versioning to the new endpoint and added an example payload....

Awesome, thanks @andreyvelich, and thank you for your detailed feedback and help moving this forward.