PaddleCloud icon indicating copy to clipboard operation
PaddleCloud copied to clipboard

Do we need paddlectl client once we have the kubernetes custom controller?

Open typhoonzero opened this issue 7 years ago • 15 comments

Once we have TPR/CRD declared resource:

apiVersion: paddlepaddle.org/v1
kind: TrainingJob
metadata:
  name: job-1
spec:
  image: "paddlepaddle/paddlecloud-job"
  trainer:
    entrypoint: "python train.py"
    workspace: "/home/job-1/"
    min-instance: 3
    max-instance: 6
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
        cpu: "800m"
        memory: "1Gi"
      requests:
        cpu: "500m"
        memory: "600Mi"
  pserver:
    min-instance: 3
    max-instance: 3
    resources:
      limits:
        cpu: "800m"
        memory: "1Gi"
      requests:
        cpu: "500m"
        memory: "600Mi"

Run kubectl create -f job.yaml is exactly equal to the current paddlectl submit -jobname xxx -gpu xxx ...

The only difference is that paddlectl client is able to upload and download training data files.

typhoonzero avatar Oct 11 '17 07:10 typhoonzero

Cool! Maybe we can use kubectl instead of paddlectl? I have some ideas about this:

  • Advantage

    • Users can use some kubectl features directly, such as kubectl logs, kubectl get pods..., we don't need to implement these features on the cloud server.
    • We can use RBAC instead of Django admin to manage the users.
  • Disadvantage

    • kubectl use YAML as the configuration file, it's hard to use the command-line parameters.

Yancey1989 avatar Oct 11 '17 07:10 Yancey1989

Plus disadvantage: kubectl exposed too much details of kubernetes that users may never use.

typhoonzero avatar Oct 11 '17 07:10 typhoonzero

An extra suggestion, shall we change the resource name from TrainingJob to Paddle? Maybe it makes more sense.

Yancey1989 avatar Oct 11 '17 07:10 Yancey1989

Plus disadvantage: If a YAML's format is not right, it's hard to find where it is, so it's not convenient for the user to use it.

gongweibao avatar Oct 11 '17 08:10 gongweibao

@Yancey1989 thought TrainingJob is more general, not only paddle training.

typhoonzero avatar Oct 11 '17 08:10 typhoonzero

this is an interesting thinking. 👍 my 2 cents are: Can we make paddlectl kind of proxy to kubectl? so that we can do some filtering on the features we don't want to expose to end user before the parameters actually hit kubectl and still keep the same command pattern?

putcn avatar Oct 11 '17 18:10 putcn

Maybe our local command line can take the yaml as the input. So we don't have to map user's input to the ymal again.

I am more inclined not allowing our user to use kubectl, since what we want to support is just a subset of kubectl (e.g., do we want to allow the user create any Pod?), maybe we can use @putcn 's idea, "make paddlectl kind of proxy to kubectl, so that we can do some filtering"

helinwang avatar Oct 11 '17 18:10 helinwang

Support @putcn 's idea! Proxing and filter is simple enough and easy!

typhoonzero avatar Oct 12 '17 07:10 typhoonzero

From @helinwang

do we want to allow the user create any Pod

I don't think so, it's not safely and out of our control.

From @putcn

make paddlectl kind of proxy to kubectl, so that we can do some filtering

It's a good idea! We can use cloud server as a proxy, paddlectl convert command-line parameters to YAML and cloud server submit the YAML to kubernetes.

Yancey1989 avatar Oct 12 '17 07:10 Yancey1989

Maybe I can develop this feature, how about push to the controller branch, so that we can publish a complete feature(auto-scaling) when we merge to the develop branch.

Yancey1989 avatar Oct 12 '17 09:10 Yancey1989

@Yancey1989 Sure, that would be awesome!

helinwang avatar Oct 12 '17 22:10 helinwang

That's a great idea, I have one more question, @Yancey1989 why we need cloud server to submit the YAML to kubernetes, could the paddlectl submit the YAML directly?

pineking avatar Oct 13 '17 02:10 pineking

Hi @pineking , As the design #378 , PaddleCloud has its own account management , RBAC in kubernetes is too simple, so we can not submit the YAML directly, and I think this is the main reason.

Yancey1989 avatar Oct 13 '17 04:10 Yancey1989

@Yancey1989 , thanks, I will read the design.

pineking avatar Oct 13 '17 05:10 pineking

Today's discussion result:

  1. We still need server since it knows about cloud storage. Command line will be backward compatible (internally convert to yaml), support use submit yaml directly. Client will send yaml to server.

  2. Eventually controller will start / scale / kill training job (now controller is only scaling job).

helinwang avatar Oct 18 '17 03:10 helinwang