PaddleCloud
PaddleCloud copied to clipboard
Do we need paddlectl client once we have the kubernetes custom controller?
Once we have TPR/CRD declared resource:
apiVersion: paddlepaddle.org/v1
kind: TrainingJob
metadata:
name: job-1
spec:
image: "paddlepaddle/paddlecloud-job"
trainer:
entrypoint: "python train.py"
workspace: "/home/job-1/"
min-instance: 3
max-instance: 6
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
cpu: "800m"
memory: "1Gi"
requests:
cpu: "500m"
memory: "600Mi"
pserver:
min-instance: 3
max-instance: 3
resources:
limits:
cpu: "800m"
memory: "1Gi"
requests:
cpu: "500m"
memory: "600Mi"
Run kubectl create -f job.yaml
is exactly equal to the current paddlectl submit -jobname xxx -gpu xxx ...
The only difference is that paddlectl
client is able to upload and download training data files.
Cool! Maybe we can use kubectl
instead of paddlectl
? I have some ideas about this:
-
Advantage
- Users can use some kubectl features directly, such as
kubectl logs
,kubectl get pods
..., we don't need to implement these features on the cloud server. - We can use RBAC instead of Django admin to manage the users.
- Users can use some kubectl features directly, such as
-
Disadvantage
-
kubectl
use YAML as the configuration file, it's hard to use the command-line parameters.
-
Plus disadvantage:
kubectl
exposed too much details of kubernetes that users may never use.
An extra suggestion, shall we change the resource name from TrainingJob
to Paddle
? Maybe it makes more sense.
Plus disadvantage: If a YAML's format is not right, it's hard to find where it is, so it's not convenient for the user to use it.
@Yancey1989 thought TrainingJob
is more general, not only paddle training.
this is an interesting thinking. 👍 my 2 cents are: Can we make paddlectl kind of proxy to kubectl? so that we can do some filtering on the features we don't want to expose to end user before the parameters actually hit kubectl and still keep the same command pattern?
Maybe our local command line can take the yaml
as the input. So we don't have to map user's input to the ymal
again.
I am more inclined not allowing our user to use kubectl
, since what we want to support is just a subset of kubectl
(e.g., do we want to allow the user create any Pod?), maybe we can use @putcn 's idea, "make paddlectl kind of proxy to kubectl
, so that we can do some filtering"
Support @putcn 's idea! Proxing and filter is simple enough and easy!
From @helinwang
do we want to allow the user create any Pod
I don't think so, it's not safely and out of our control.
From @putcn
make paddlectl kind of proxy to kubectl, so that we can do some filtering
It's a good idea! We can use cloud server as a proxy, paddlectl
convert command-line parameters to YAML and cloud server submit the YAML to kubernetes.
Maybe I can develop this feature, how about push to the controller branch, so that we can publish a complete feature(auto-scaling) when we merge to the develop branch.
@Yancey1989 Sure, that would be awesome!
That's a great idea, I have one more question, @Yancey1989 why we need cloud server to submit the YAML to kubernetes, could the paddlectl
submit the YAML directly?
Hi @pineking , As the design #378 , PaddleCloud has its own account management , RBAC in kubernetes is too simple, so we can not submit the YAML directly, and I think this is the main reason.
@Yancey1989 , thanks, I will read the design.
Today's discussion result:
-
We still need server since it knows about cloud storage. Command line will be backward compatible (internally convert to yaml), support use submit yaml directly. Client will send yaml to server.
-
Eventually controller will start / scale / kill training job (now controller is only scaling job).