common icon indicating copy to clipboard operation
common copied to clipboard

PR 135 makes replicaType updates hard

Open Jeffwan opened this issue 4 years ago • 3 comments

In this PR, https://github.com/kubeflow/common/pull/135/files, it changes rtype to ReplicaType.

However, it brings some challenges in operator upgrade.

-		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, rtype)
+		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, apiv1.ReplicaType(rtype))

Using this line as an example, controller retrieve rtype from pod label. It's lower case at this moment.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/tensorflow/v1/types.go#L77

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/pytorch/v1/pytorchjob_types.go#L62

apiv1.ReplicaType(rtype) is inaccurate and meaningless because we define Upper case role like Master in the past,
and this conversion give us a non-exist ReplicaType because it's lower case master.

we have some references like below.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/controller.v1/pytorch/pytorch.go#L46

I feel like all the role should be lower case, we make sure we don't use ReplicaType for comparison. Instead, still use strings.toLower(string(rtype)) which means PR 135 still not helpful?

/cc @MartinForReal @gaocegege @kubeflow/wg-training-leads

Jeffwan avatar Aug 29 '21 18:08 Jeffwan

This indeed feel like unnecessary and complicate the implementations. Kubeflow/common should aim to be easy for downstream operators to implement.

terrytangyuan avatar Aug 29 '21 21:08 terrytangyuan

Maybe more helper functions would resolve comparison issue? I think we can add some type name convention by defining ReplicaType type and add more util functions.

MartinForReal avatar Aug 30 '21 02:08 MartinForReal

Maybe more helper functions would resolve comparison issue? I think we can add some type name convention by defining ReplicaType type and add more util functions.

@MartinForReal

Yeah, I add something like https://github.com/kubeflow/tf-operator/pull/1388/files#diff-7600abe3663b532d64fec9dae52fcb3b972fb034c53a45fe0030fc61776d1612R56

Did you use kubeflow/common in operator other than kubeflow training operators?

Jeffwan avatar Aug 30 '21 03:08 Jeffwan