pai
pai copied to clipboard
Mandatory Service
What would you like to be added: In OpenPAI v1.5.0 the total service components captured from info.log during basic deployment are: ['dshuttle-csi', 'marketplace-db', 'cluster-configuration', 'internal-storage', 'alert-manager', 'fluentd', 'database-controller', 'marketplace-webportal', 'log-manager', 'openpai-runtime', 'hivedscheduler', 'postgresql', 'marketplace-restserver', 'storage-manager', 'device-plugin', 'watchdog', 'frameworkcontroller', 'grafana', 'webportal', 'prometheus', 'rest-server', 'pylon', 'k8s-dashboard', 'job-exporter', 'dshuttle-master', 'node-exporter', 'dshuttle-worker'].
In an offline deployment environment or some dockerfhub limitation(https://github.com/microsoft/pai/issues/5202) it takes additional time and disk space to have all of them in local registry.
Hope to have a minimum mandatory service list for deployment.
Why is this needed:
It is estimated that we need to pull around 17GB docker images on devbox/master/worker based on the list below. If we don't need to deploy them all in an offline environment we can save some deployment duration and storage space.
- Devbox
27910a99b233 3.02GB openpai/dev-box
- Master
da86e6ba6ca1 742kB gcr.io/google-containers/pause
da86e6ba6ca1 742kB gcr.io/google_containers/pause-amd64
1c35c4412082 1.22MB busybox
66d704127f79 9.54MB openpai/log-manager-cleaner
0b7eeeeb7a9e 21.2MB rocm/k8s-device-plugin
e5a616e4b9cf 22.9MB openpai/node-exporter
90f439897fb1 36.1MB prom/alertmanager
28c771d7cfbf 40.6MB quay.io/coreos/etcd
680bc53e5985 42.2MB coredns/coredns
283860d96794 46.8MB calico/kube-controllers
dfe4432cd2e2 47.7MB gcr.io/google-containers/cluster-proportional-autoscaler-amd64
dfe4432cd2e2 47.7MB k8s.gcr.io/cluster-proportional-autoscaler-amd64
a4d224a32ba5 48.6MB openpai/watchdog
e1cd8f190802 53.1MB frameworkcontroller/frameworkcontroller
1610bb411c22 55.6MB hivedscheduler/hivedscheduler
abbb30c1f3e6 63.2MB everpeace/k8s-host-device-plugin
75b52b7d5662 64.1MB nvidia/k8s-device-plugin
ee18f350636d 81.6MB openpai/kube-scheduler
3b8ffbdbcca3 82.8MB gcr.io/google-containers/kube-proxy
133a50b2b327 83.6MB gcr.io/google-containers/kube-scheduler
7e91e27a5806 96.1MB certbot/certbot
53f3fd8007f7 109MB nginx
c8ecf7c719c1 112MB openpai/prometheus
f9aed6605b81 122MB openpai/kubernetes-dashboard-amd64
68adc2c4c12a 124MB openpai/internal-storage-delete
ebf5db7b0d17 124MB openpai/internal-storage-create
9beeba249f3e 127MB nginx
1a6ade52d471 135MB calico/cni
01aec835c89f 151MB gcr.io/google-containers/kube-controller-manager
bf4ff15c9db0 156MB calico/node
48db9392345b 160MB gcr.io/google-containers/kube-apiserver
3e52f79771fe 178MB openpai/postgresql-init-client
254aee69546f 190MB nvcr.io/nvidia/k8s-device-plugin
ec628cda46f9 279MB openpai/fluentd
aa4e2272202e 285MB openpai/grafana
a8cba86132ee 322MB openpai/log-manager-nginx
ec61ce630b02 348MB openpai/postgresql
f88dfa384cc4 348MB openpai/marketplace-db
593787c79bd6 709MB openpai/pylon
59992b19b6de 974MB openpai/cleaning-image
2afda31b1aa7 1GB openpai/marketplace-restserver
038b3d496ee0 1.06GB openpai/database-controller
612d5d77cf09 1.07GB openpai/job-exporter
a5ef25a4ced7 1.08GB openpai/rest-server
1bdd19146cb0 1.19GB openpai/marketplace-webportal
b0f4cb231332 1.76GB openpai/webportal
- Worker
da86e6ba6ca1 742kB gcr.io/google-containers/pause
da86e6ba6ca1 742kB gcr.io/google_containers/pause-amd64
66d704127f79 9.54MB openpai/log-manager-cleaner
0b7eeeeb7a9e 21.2MB rocm/k8s-device-plugin
e5a616e4b9cf 22.9MB openpai/node-exporter
680bc53e5985 42.2MB coredns/coredns
283860d96794 46.8MB calico/kube-controllers
abbb30c1f3e6 63.2MB everpeace/k8s-host-device-plugin
75b52b7d5662 64.1MB nvidia/k8s-device-plugin
3b8ffbdbcca3 82.8MB gcr.io/google-containers/kube-proxy
133a50b2b327 83.6MB gcr.io/google-containers/kube-scheduler
05fcda0d8e42 105MB nvidia/cuda
53f3fd8007f7 109MB nginx
2ec708416bb8 122MB nvidia/cuda
68adc2c4c12a 124MB openpai/internal-storage-delete
9beeba249f3e 127MB nginx
1a6ade52d471 135MB calico/cni
01aec835c89f 151MB gcr.io/google-containers/kube-controller-manager
bf4ff15c9db0 156MB calico/node
48db9392345b 160MB gcr.io/google-containers/kube-apiserver
254aee69546f 190MB nvcr.io/nvidia/k8s-device-plugin
a8cba86132ee 322MB openpai/log-manager-nginx
a31215877a99 333MB openpai/openpai-runtime
59992b19b6de 974MB openpai/cleaning-image
612d5d77cf09 1.07GB openpai/job-exporter
Without this feature, how does the current module work: We might need to allocate > 20GB docker registry storage and it took time in docker registry construction and openpai service deployment.
Components that may involve changes: Maybe we can skip marketplace-db, marketplace-restserver and marketplace-webportal service deployment.