pai icon indicating copy to clipboard operation
pai copied to clipboard

Mandatory Service

Open JosephKang opened this issue 3 years ago • 0 comments

What would you like to be added: In OpenPAI v1.5.0 the total service components captured from info.log during basic deployment are: ['dshuttle-csi', 'marketplace-db', 'cluster-configuration', 'internal-storage', 'alert-manager', 'fluentd', 'database-controller', 'marketplace-webportal', 'log-manager', 'openpai-runtime', 'hivedscheduler', 'postgresql', 'marketplace-restserver', 'storage-manager', 'device-plugin', 'watchdog', 'frameworkcontroller', 'grafana', 'webportal', 'prometheus', 'rest-server', 'pylon', 'k8s-dashboard', 'job-exporter', 'dshuttle-master', 'node-exporter', 'dshuttle-worker'].

In an offline deployment environment or some dockerfhub limitation(https://github.com/microsoft/pai/issues/5202) it takes additional time and disk space to have all of them in local registry.

Hope to have a minimum mandatory service list for deployment.

Why is this needed:

It is estimated that we need to pull around 17GB docker images on devbox/master/worker based on the list below. If we don't need to deploy them all in an offline environment we can save some deployment duration and storage space.

- Devbox
27910a99b233    3.02GB  openpai/dev-box

- Master
da86e6ba6ca1    742kB   gcr.io/google-containers/pause
da86e6ba6ca1    742kB   gcr.io/google_containers/pause-amd64
1c35c4412082    1.22MB  busybox
66d704127f79    9.54MB  openpai/log-manager-cleaner
0b7eeeeb7a9e    21.2MB  rocm/k8s-device-plugin
e5a616e4b9cf    22.9MB  openpai/node-exporter
90f439897fb1    36.1MB  prom/alertmanager
28c771d7cfbf    40.6MB  quay.io/coreos/etcd
680bc53e5985    42.2MB  coredns/coredns
283860d96794    46.8MB  calico/kube-controllers
dfe4432cd2e2    47.7MB  gcr.io/google-containers/cluster-proportional-autoscaler-amd64
dfe4432cd2e2    47.7MB  k8s.gcr.io/cluster-proportional-autoscaler-amd64
a4d224a32ba5    48.6MB  openpai/watchdog
e1cd8f190802    53.1MB  frameworkcontroller/frameworkcontroller
1610bb411c22    55.6MB  hivedscheduler/hivedscheduler
abbb30c1f3e6    63.2MB  everpeace/k8s-host-device-plugin
75b52b7d5662    64.1MB  nvidia/k8s-device-plugin
ee18f350636d    81.6MB  openpai/kube-scheduler
3b8ffbdbcca3    82.8MB  gcr.io/google-containers/kube-proxy
133a50b2b327    83.6MB  gcr.io/google-containers/kube-scheduler
7e91e27a5806    96.1MB  certbot/certbot
53f3fd8007f7    109MB   nginx
c8ecf7c719c1    112MB   openpai/prometheus
f9aed6605b81    122MB   openpai/kubernetes-dashboard-amd64
68adc2c4c12a    124MB   openpai/internal-storage-delete
ebf5db7b0d17    124MB   openpai/internal-storage-create
9beeba249f3e    127MB   nginx
1a6ade52d471    135MB   calico/cni
01aec835c89f    151MB   gcr.io/google-containers/kube-controller-manager
bf4ff15c9db0    156MB   calico/node
48db9392345b    160MB   gcr.io/google-containers/kube-apiserver
3e52f79771fe    178MB   openpai/postgresql-init-client
254aee69546f    190MB   nvcr.io/nvidia/k8s-device-plugin
ec628cda46f9    279MB   openpai/fluentd
aa4e2272202e    285MB   openpai/grafana
a8cba86132ee    322MB   openpai/log-manager-nginx
ec61ce630b02    348MB   openpai/postgresql
f88dfa384cc4    348MB   openpai/marketplace-db
593787c79bd6    709MB   openpai/pylon
59992b19b6de    974MB   openpai/cleaning-image
2afda31b1aa7    1GB     openpai/marketplace-restserver
038b3d496ee0    1.06GB  openpai/database-controller
612d5d77cf09    1.07GB  openpai/job-exporter
a5ef25a4ced7    1.08GB  openpai/rest-server
1bdd19146cb0    1.19GB  openpai/marketplace-webportal
b0f4cb231332    1.76GB  openpai/webportal

- Worker
da86e6ba6ca1    742kB   gcr.io/google-containers/pause
da86e6ba6ca1    742kB   gcr.io/google_containers/pause-amd64
66d704127f79    9.54MB  openpai/log-manager-cleaner
0b7eeeeb7a9e    21.2MB  rocm/k8s-device-plugin
e5a616e4b9cf    22.9MB  openpai/node-exporter
680bc53e5985    42.2MB  coredns/coredns
283860d96794    46.8MB  calico/kube-controllers
abbb30c1f3e6    63.2MB  everpeace/k8s-host-device-plugin
75b52b7d5662    64.1MB  nvidia/k8s-device-plugin
3b8ffbdbcca3    82.8MB  gcr.io/google-containers/kube-proxy
133a50b2b327    83.6MB  gcr.io/google-containers/kube-scheduler
05fcda0d8e42    105MB   nvidia/cuda
53f3fd8007f7    109MB   nginx
2ec708416bb8    122MB   nvidia/cuda
68adc2c4c12a    124MB   openpai/internal-storage-delete
9beeba249f3e    127MB   nginx
1a6ade52d471    135MB   calico/cni
01aec835c89f    151MB   gcr.io/google-containers/kube-controller-manager
bf4ff15c9db0    156MB   calico/node
48db9392345b    160MB   gcr.io/google-containers/kube-apiserver
254aee69546f    190MB   nvcr.io/nvidia/k8s-device-plugin
a8cba86132ee    322MB   openpai/log-manager-nginx
a31215877a99    333MB   openpai/openpai-runtime
59992b19b6de    974MB   openpai/cleaning-image
612d5d77cf09    1.07GB  openpai/job-exporter

Without this feature, how does the current module work: We might need to allocate > 20GB docker registry storage and it took time in docker registry construction and openpai service deployment.

Components that may involve changes: Maybe we can skip marketplace-db, marketplace-restserver and marketplace-webportal service deployment.

JosephKang avatar Mar 16 '21 03:03 JosephKang