Jiaxin Shan
Jiaxin Shan
An update on above items. @zw0610 @kubeflow/wg-training-leads > 1. code repo process, project name -> confirm with Boggy. reuse tf-operator and rename to `kubeflow/training-operator`. pending confirmation with Boggy. all issues,...
> Is there any limitation why we need to use Kubernetes 1.19 ? Can we just jump to 1.20 or even to the latest 1.21 version ? Yeah, this is...
@johnugeorge sure. I will cc all training leads for PRs coming into feature branch.
Yeah. I am thinking how we can insert "clusterSpec" environment for different frameworks? ``` { "worker": ["worker0.example.com:2222","worker1.example.com:2222","worker2.example.com:2222"], "ps": ["ps0.example.com:2222","ps1.example.com:2222"] } ``` different framework have different settings on this part. The...
/assign @zw0610
@zw0610 sure. Feel free to pick it up
I think we can have an interface there, and leave replica-type as default implementation in kubeflow/common, different operators can still override it if there's a need.
Having success/failure would be great which would be easier for different frameworks to handle errors and it help make reconciler logic extensible.
@DavidSpek I don't quite understand the purpose here. I think pipeline has many 3rd dependencies. However, other projects like training operators they only use go and python(SDK). Currently, it's not...
@DavidSpek Right. The scripts seems to generate `assignees` list from owners files and then create dependabot yaml. I am wondering this is a required from dependabot. because `dependabot` doesn't need...