pai icon indicating copy to clipboard operation
pai copied to clipboard

Pai master Label is missing on single node deployment

Open JosephKang opened this issue 4 years ago • 2 comments

Organization Name: Advantech

Short summary about the issue/question: pai-master=true label is missing on single node deployment

Brief what process you are following: We deployment one master and one worker on single node due to hardware resource limitation. The deployment is OK when we comment some duplicate checking code. However, the deployment is pending upon openpai service deployment. I check the node status and found that the pai-master is missing. The deployment can be moved on when I restore the tag on demand( kubectl label nodes ${node} pai-master=true ) Where should I force the master label function upon single node deployment?

How to reproduce it: 1, Checkout openPAI v1.5.0, and set master and worker as same node on layout.yaml 2. Run quick-start-service.sh script

OpenPAI Environment:

  • OpenPAI version: V1.5.0
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS

JosephKang avatar Apr 08 '21 04:04 JosephKang

I didn't investigate the problem because single-node installation is not supported yet.

But for node labeling, it is done here: https://github.com/microsoft/pai/blob/v1.5.0/src/cluster-configuration/deploy/start.sh.template#L43-L60

hzy46 avatar Apr 08 '21 06:04 hzy46

Humm... It is a little strange that the original template failed to work unless the template is changed described below (nested if closure to mark pai-master and then to mark pai-worker)

diff --git a/src/cluster-configuration/deploy/start.sh.template b/src/cluster-configuration/deploy/start.sh.template
index dec727a..3fc2a15 100644
--- a/src/cluster-configuration/deploy/start.sh.template
+++ b/src/cluster-configuration/deploy/start.sh.template
@@ -41,8 +41,16 @@ kubectl apply --overwrite=true -f priority-class.yaml || exit $?
 # Add `pai-master`, `pai-worker`, `pai-storage` label to corresponding nodes and remove irrelant labels
 (
 {%- for host in cluster_cfg['layout']['machine-list'] %}
+#    {%- if 'pai-master' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-master'] == 'true' %}
+#echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true pai-worker=false || exit $?
+#    {%- else %}
+#echo kubectl label nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master- || exit $?
+#    {%- endif %}
     {%- if 'pai-master' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-master'] == 'true' %}
-echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true pai-worker=false || exit $?
+echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true || exit $?
+        {%- if 'pai-worker' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-worker'] == 'true' %}
+echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-worker=true || exit $?
+        {%- endif %}
     {%- else %}
 echo kubectl label nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master- || exit $?
     {%- endif %}

JosephKang avatar Apr 11 '21 15:04 JosephKang