Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Unified training operator working progress

An update on above items. @zw0610 @kubeflow/wg-training-leads > 1. code repo process, project name -> confirm with Boggy. reuse tf-operator and rename to `kubeflow/training-operator`. pending confirmation with Boggy. all issues,...

Unified training operator working progress

> Is there any limitation why we need to use Kubernetes 1.19 ? Can we just jump to 1.20 or even to the latest 1.21 version ? Yeah, this is...

Unified training operator working progress

@johnugeorge sure. I will cc all training leads for PRs coming into feature branch.

Dynamic roles which can technically support any potential frameworks

Yeah. I am thinking how we can insert "clusterSpec" environment for different frameworks? ``` { "worker": ["worker0.example.com:2222","worker1.example.com:2222","worker2.example.com:2222"], "ps": ["ps0.example.com:2222","ps1.example.com:2222"] } ``` different framework have different settings on this part. The...

Use Interface instead of lister/informer in controller

/assign @zw0610

Use Interface instead of lister/informer in controller

@zw0610 sure. Feel free to pick it up

Allow modification of controller's replica index/type labels

I think we can have an interface there, and leave replica-type as default implementation in kubeflow/common, different operators can still override it if there's a need.

Consider supporting SuccessPolicy and FailurePolicy

Having success/failure would be great which would be easier for different frameworks to handle errors and it help make reconciler logic extensible.

add dependabot config script

@DavidSpek I don't quite understand the purpose here. I think pipeline has many 3rd dependencies. However, other projects like training operators they only use go and python(SDK). Currently, it's not...

add dependabot config script

@DavidSpek Right. The scripts seems to generate `assignees` list from owners files and then create dependabot yaml. I am wondering this is a required from dependabot. because `dependabot` doesn't need...