DeepSea Can not remove cluster node

Description of Issue/Question

I can't remove cluster node , when i run stage2 and stage5 I want to move node004 node, but find that modifying the policy-cfg file, executing stage2 and stage5 does not remove the node?

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

# cat /srv/pillar/ceph/proposals/policy.cfg
## Cluster Assignment
#cluster-ceph/cluster/*.sls
cluster-ceph/cluster/node00[1-3]*.sls
cluster-ceph/cluster/admin*.sls

## Roles
# ADMIN  
role-master/cluster/admin*.sls
role-admin/cluster/admin*.sls

# Monitoring
role-prometheus/cluster/admin*.sls
role-grafana/cluster/admin*.sls

# MON
role-mon/cluster/node00[1-3]*.sls

# MGR (mgrs are usually colocated with mons)
role-mgr/cluster/node00[1-3]*.sls

# COMMON
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml

# Storage   
#role-storage/cluster/*.sls 
role-storage/cluster/node00[1-3]*.sls

# salt-run state.orch ceph.stage.2
# salt-run state.orch ceph.stage.5

admin:/srv/pillar/ceph/proposals # ll role-storage/cluster/
total 20
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 admin.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node001.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node002.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node003.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 15:58 node004.example.com.sls

admin:~ # salt 'node004*' pillar.items node004.example.com: ---------- available_roles: - storage - admin - mon - mds - mgr - igw - grafana - prometheus - storage - rgw - ganesha - client-cephfs - client-radosgw - client-iscsi - client-nfs - benchmark-rbd - benchmark-blockdev - benchmark-fs - master benchmark: ---------- default-collection: simple.yml extra_mount_opts: nocrc job-file-directory: /run/ceph_bench_jobs log-file-directory: /var/log/ceph_bench_logs work-directory: /run/ceph_bench cluster: ceph cluster_network: 192.168.3.0/24 deepsea_minions: * disk_led: ---------- cmd: ---------- fault: ---------- off: lsmcli local-disk-fault-led-off --path '{device_file}' on: lsmcli local-disk-fault-led-on --path '{device_file}' ident: ---------- off: lsmcli local-disk-ident-led-off --path '{device_file}' on: lsmcli local-disk-ident-led-on --path '{device_file}' fsid: 91149fed-265c-4698-aebb-5d3535fdd70a monitoring: ---------- prometheus: ---------- metric_relabel_config: ---------- ceph: grafana: node_exporter: prometheus: relabel_config: ---------- ceph: grafana: node_exporter: prometheus: rule_files: scrape_interval: ---------- ceph: 10s grafana: 10s node_exporter: 10s prometheus: 10s target_partition: ---------- ceph: 1/1 grafana: 1/1 node_exporter: 1/1 prometheus: 1/1 public_network: 192.168.2.0/24 roles: - storage

admin:~ # ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.06857 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.00980     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000

Steps to Reproduce Issue

(Include debug logs if possible and relevant.)

Versions Report

(Provided by running: salt-run deepsea.version rpm -qi salt-minion rpm -qi salt-master )

admin:~ # salt-run deepsea.version
[ERROR   ] Exception during resolving address: [Errno 2] Host name lookup failure
[ERROR   ] Exception during resolving address: [Errno 2] Host name lookup failure
0.9.23+git.0.6a24f24a0
admin:~ # rpm -qi salt-minion
Name        : salt-minion
Version     : 2019.2.0
Release     : 6.3.5
Architecture: x86_64
Install Date: Sat Sep 21 14:19:15 2019
Group       : System/Management
Size        : 41019
License     : Apache-2.0
Signature   : RSA/SHA256, Tue May 28 23:28:21 2019, Key ID 70af9e8139db7c82
Source RPM  : salt-2019.2.0-6.3.5.src.rpm
Build Date  : Tue May 28 23:24:20 2019
Build Host  : sheep28
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC 
URL         : http://saltstack.org/
Summary     : The client component for Saltstack
Description :
Salt minion is queried and controlled from the master.
Listens to the salt master and execute the commands.
Distribution: SUSE Linux Enterprise 15
admin:~ # rpm -qi salt-master
Name        : salt-master
Version     : 2019.2.0
Release     : 6.3.5
Architecture: x86_64
Install Date: Sat Sep 21 14:19:16 2019
Group       : System/Management
Size        : 2936818
License     : Apache-2.0
Signature   : RSA/SHA256, Tue May 28 23:28:21 2019, Key ID 70af9e8139db7c82
Source RPM  : salt-2019.2.0-6.3.5.src.rpm
Build Date  : Tue May 28 23:24:20 2019
Build Host  : sheep28
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC 
URL         : http://saltstack.org/
Summary     : The management component of Saltstack with zmq protocol supported
Description :
The Salt master is the central server to which all minions connect.
Enabled commands to remote systems to be called in parallel rather
than serially.
Distribution: SUSE Linux Enterprise 15

Sep 22 '19 08:09 akumacxd

Does stage 5 produce any error messages? Do you have a stage 5 output?

Sep 24 '19 08:09 jan--f

Does stage 5 produce any error messages? Do you have a stage 5 output?

stage 5 no error message

In my opinion, after stage 2 is executed, node004 should not be the storage role But when I look at pillar.items , node004 is still the storage role, execute stage 5 no error message

And I found another problem, when I need to change the NTP server address, executing stage 2 doesn't work. Executing saltutil.pillar_refresh only works, but when you execute stage 2 again, the address changes back to the admin node

Sep 24 '19 16:09 akumacxd

DeepSea DeepSea copied to clipboard

Can not remove cluster node

Description of Issue/Question

Setup

Steps to Reproduce Issue

Versions Report

DeepSea
DeepSea copied to clipboard