Pai master Label is missing on single node deployment
Organization Name: Advantech
Short summary about the issue/question:
pai-master=true label is missing on single node deployment
Brief what process you are following:
We deployment one master and one worker on single node due to hardware resource limitation.
The deployment is OK when we comment some duplicate checking code. However, the deployment is pending upon openpai service deployment.
I check the node status and found that the pai-master is missing.
The deployment can be moved on when I restore the tag on demand( kubectl label nodes ${node} pai-master=true )
Where should I force the master label function upon single node deployment?
How to reproduce it: 1, Checkout openPAI v1.5.0, and set master and worker as same node on layout.yaml 2. Run quick-start-service.sh script
OpenPAI Environment:
- OpenPAI version: V1.5.0
- OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS
I didn't investigate the problem because single-node installation is not supported yet.
But for node labeling, it is done here: https://github.com/microsoft/pai/blob/v1.5.0/src/cluster-configuration/deploy/start.sh.template#L43-L60
Humm... It is a little strange that the original template failed to work unless the template is changed described below (nested if closure to mark pai-master and then to mark pai-worker)
diff --git a/src/cluster-configuration/deploy/start.sh.template b/src/cluster-configuration/deploy/start.sh.template
index dec727a..3fc2a15 100644
--- a/src/cluster-configuration/deploy/start.sh.template
+++ b/src/cluster-configuration/deploy/start.sh.template
@@ -41,8 +41,16 @@ kubectl apply --overwrite=true -f priority-class.yaml || exit $?
# Add `pai-master`, `pai-worker`, `pai-storage` label to corresponding nodes and remove irrelant labels
(
{%- for host in cluster_cfg['layout']['machine-list'] %}
+# {%- if 'pai-master' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-master'] == 'true' %}
+#echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true pai-worker=false || exit $?
+# {%- else %}
+#echo kubectl label nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master- || exit $?
+# {%- endif %}
{%- if 'pai-master' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-master'] == 'true' %}
-echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true pai-worker=false || exit $?
+echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master=true || exit $?
+ {%- if 'pai-worker' in cluster_cfg['layout']['machine-list'][host] and cluster_cfg['layout']['machine-list'][host]['pai-worker'] == 'true' %}
+echo kubectl label --overwrite=true nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-worker=true || exit $?
+ {%- endif %}
{%- else %}
echo kubectl label nodes {{ cluster_cfg['layout']['machine-list'][host]['hostname'] }} pai-master- || exit $?
{%- endif %}