DeepSea icon indicating copy to clipboard operation
DeepSea copied to clipboard

Can not remove cluster node

Open akumacxd opened this issue 4 years ago • 2 comments

Description of Issue/Question

I can't remove cluster node , when i run stage2 and stage5 图片 I want to move node004 node, but find that modifying the policy-cfg file, executing stage2 and stage5 does not remove the node?

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

# cat /srv/pillar/ceph/proposals/policy.cfg
## Cluster Assignment
#cluster-ceph/cluster/*.sls
cluster-ceph/cluster/node00[1-3]*.sls
cluster-ceph/cluster/admin*.sls

## Roles
# ADMIN  
role-master/cluster/admin*.sls
role-admin/cluster/admin*.sls

# Monitoring
role-prometheus/cluster/admin*.sls
role-grafana/cluster/admin*.sls

# MON
role-mon/cluster/node00[1-3]*.sls

# MGR (mgrs are usually colocated with mons)
role-mgr/cluster/node00[1-3]*.sls

# COMMON
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml

# Storage   
#role-storage/cluster/*.sls 
role-storage/cluster/node00[1-3]*.sls 
# salt-run state.orch ceph.stage.2
# salt-run state.orch ceph.stage.5
admin:/srv/pillar/ceph/proposals # ll role-storage/cluster/
total 20
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 admin.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node001.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node002.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 14:26 node003.example.com.sls
-rw-r--r-- 1 salt salt 17 Sep 21 15:58 node004.example.com.sls
admin:~ # salt 'node004*' pillar.items
node004.example.com:
    ----------
    available_roles:
        - storage
        - admin
        - mon
        - mds
        - mgr
        - igw
        - grafana
        - prometheus
        - storage
        - rgw
        - ganesha
        - client-cephfs
        - client-radosgw
        - client-iscsi
        - client-nfs
        - benchmark-rbd
        - benchmark-blockdev
        - benchmark-fs
        - master
    benchmark:
        ----------
        default-collection:
            simple.yml
        extra_mount_opts:
            nocrc
        job-file-directory:
            /run/ceph_bench_jobs
        log-file-directory:
            /var/log/ceph_bench_logs
        work-directory:
            /run/ceph_bench
    cluster:
        ceph
    cluster_network:
        192.168.3.0/24
    deepsea_minions:
        *
    disk_led:
        ----------
        cmd:
            ----------
            fault:
                ----------
                off:
                    lsmcli local-disk-fault-led-off --path '{device_file}'
                on:
                    lsmcli local-disk-fault-led-on --path '{device_file}'
            ident:
                ----------
                off:
                    lsmcli local-disk-ident-led-off --path '{device_file}'
                on:
                    lsmcli local-disk-ident-led-on --path '{device_file}'
    fsid:
        91149fed-265c-4698-aebb-5d3535fdd70a
    monitoring:
        ----------
        prometheus:
            ----------
            metric_relabel_config:
                ----------
                ceph:
                grafana:
                node_exporter:
                prometheus:
            relabel_config:
                ----------
                ceph:
                grafana:
                node_exporter:
                prometheus:
            rule_files:
            scrape_interval:
                ----------
                ceph:
                    10s
                grafana:
                    10s
                node_exporter:
                    10s
                prometheus:
                    10s
            target_partition:
                ----------
                ceph:
                    1/1
                grafana:
                    1/1
                node_exporter:
                    1/1
                prometheus:
                    1/1
    public_network:
        192.168.2.0/24
    roles:
        - storage    
admin:~ # ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.06857 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.00980     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 

Steps to Reproduce Issue

(Include debug logs if possible and relevant.)

Versions Report

(Provided by running: salt-run deepsea.version rpm -qi salt-minion rpm -qi salt-master )

admin:~ # salt-run deepsea.version
[ERROR   ] Exception during resolving address: [Errno 2] Host name lookup failure
[ERROR   ] Exception during resolving address: [Errno 2] Host name lookup failure
0.9.23+git.0.6a24f24a0
admin:~ # rpm -qi salt-minion
Name        : salt-minion
Version     : 2019.2.0
Release     : 6.3.5
Architecture: x86_64
Install Date: Sat Sep 21 14:19:15 2019
Group       : System/Management
Size        : 41019
License     : Apache-2.0
Signature   : RSA/SHA256, Tue May 28 23:28:21 2019, Key ID 70af9e8139db7c82
Source RPM  : salt-2019.2.0-6.3.5.src.rpm
Build Date  : Tue May 28 23:24:20 2019
Build Host  : sheep28
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC 
URL         : http://saltstack.org/
Summary     : The client component for Saltstack
Description :
Salt minion is queried and controlled from the master.
Listens to the salt master and execute the commands.
Distribution: SUSE Linux Enterprise 15
admin:~ # rpm -qi salt-master
Name        : salt-master
Version     : 2019.2.0
Release     : 6.3.5
Architecture: x86_64
Install Date: Sat Sep 21 14:19:16 2019
Group       : System/Management
Size        : 2936818
License     : Apache-2.0
Signature   : RSA/SHA256, Tue May 28 23:28:21 2019, Key ID 70af9e8139db7c82
Source RPM  : salt-2019.2.0-6.3.5.src.rpm
Build Date  : Tue May 28 23:24:20 2019
Build Host  : sheep28
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC 
URL         : http://saltstack.org/
Summary     : The management component of Saltstack with zmq protocol supported
Description :
The Salt master is the central server to which all minions connect.
Enabled commands to remote systems to be called in parallel rather
than serially.
Distribution: SUSE Linux Enterprise 15

akumacxd avatar Sep 22 '19 08:09 akumacxd

Does stage 5 produce any error messages? Do you have a stage 5 output?

jan--f avatar Sep 24 '19 08:09 jan--f

Does stage 5 produce any error messages? Do you have a stage 5 output?

stage 5 no error message

In my opinion, after stage 2 is executed, node004 should not be the storage role But when I look at pillar.items , node004 is still the storage role, execute stage 5 no error message

And I found another problem, when I need to change the NTP server address, executing stage 2 doesn't work. Executing saltutil.pillar_refresh only works, but when you execute stage 2 again, the address changes back to the admin node

akumacxd avatar Sep 24 '19 16:09 akumacxd