Jiaxin Shan
Jiaxin Shan
Currently, mxnet operator API doesn't follow `kubeflow/common` convention and but controller does. It imports tf-operator implementation `https://github.com/kubeflow/mxnet-operator/blob/c718707b304dc1ed0210a740c8efe7071d4ebb3e/pkg/controller.v1/mxnet/controller.go#L42` To graduate mxnet-operator to v1, we'd like to migrate to follow `kubecom/common` convention....
I create this parent ticket to track all the changes to graduate mxnet-operator to v1. ### Configuration and deployment Description | Category | Status | Issue -- | -- |...
Every framework's implementation is pretty close and I am thinking we actually don't need that many controllers/operator. If we can support custom roles, most popular framework can adapt to it....
A follow up PR of https://github.com/kubeflow/common/pull/155 In the common code base, we still use low level listers and hasSynced which is not used in training-operator any more. Let's change to...
In this PR, https://github.com/kubeflow/common/pull/135/files, it changes rtype to ReplicaType. However, it brings some challenges in operator upgrade. ``` - expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, rtype) + expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, apiv1.ReplicaType(rtype)) ``` Using...
All of below have reference in different operators. This makes logs very messy and we need to consolidate to one common utility 1. "github.com/sirupsen/logrus" 2. "github.com/go-logr/logr" - default logger interface...
@zw0610 and I present [all-in-one training operator proposal](https://docs.google.com/document/d/1x1JPDQfDMIbnoQRftDH1IzGU0qvHGSU4W6Jl4rJLPhI/edit?usp=drive_web&ouid=111760271373344466402) in last month community meeting. WG-Training leads have already agreed to move forward. This issue is created to track implementation progress. The...
Some job controller already has prometheus support while some do not. This story is used to track the progress to expose prometheus metrics for all controllers. - Build a common...
We should extract common logic from kube-batch/volcano, make sure user pass some client by their own based on their choice.
Currently, `kubeflow/common` provides a logger utils for operator to use. It uses `github.com/sirupsen/logrus` underneath. https://github.com/kubeflow/common/blob/master/pkg/util/logger.go#L20 The problem I notice is new operators using `kubebuilder` have inbuilt logger for operator logics....