BUG: Can not deploy any type of Ceph cluster
I can not deploy a multi or single-node cluster anymore, regardless if ses6, ses7, ses7p or pacific.
E.g. the following command(s) are used to deploy the cluster:
$ sesdev create ses7 --single-node --non-interactive ses7-mini
$ sesdev create pacific --single-node --non-interactive pacific-mini
$ sesdev create ses7p --non-interactive ses7p-default
One of the following errors always appears and aborts the deployment.
master: ++ ceph-salt status
master: Cluster: 1 minions, 0 hosts managed by cephadm
master: OS: SUSE Linux Enterprise Server 15 SP2
master: Ceph RPMs: Not installed
master: Config: OK
master: ++ zypper repos --details
master: # | Alias | Name | Enabled | GPG Check | Refresh | Priority | Type | URI | Service
master: ---+--------------------+-------------------------------------------+---------+-----------+---------+----------+--------+----------------------------------------------------------------------------------------------------------------+--------
master: 1 | SUSE_CA | SUSE Internal CA Certificate (SLE_15_SP2) | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.suse.de/ibs/SUSE:/CA/SLE_15_SP2/ |
master: 2 | base | base | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Basesystem/15-SP2/x86_64/product/ |
master: 3 | devel-repo-1 | devel-repo-1 | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/Devel:/Storage:/7.0/images/repo/SUSE-Enterprise-Storage-7-POOL-x86_64-Media1/ |
master: 4 | product | product | Yes | (r ) Yes | Yes | 99 | rpm-md | http://dist.suse.de/ibs/SUSE/Products/SLE-Product-SLES/15-SP2/x86_64/product/ |
master: 5 | product-update | product-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://dist.suse.de/ibs/SUSE/Updates/SLE-Product-SLES/15-SP2/x86_64/update/ |
master: 6 | server-apps | server-apps | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Server-Applications/15-SP2/x86_64/product/ |
master: 7 | server-apps-update | server-apps-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Server-Applications/15-SP2/x86_64/update/ |
master: 8 | storage | storage | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/Storage/7/x86_64/product/ |
master: 9 | storage-update | storage-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/Storage/7/x86_64/update/ |
master: 10 | update | update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Basesystem/15-SP2/x86_64/update/ |
master: ++ zypper info cephadm
master: ++ grep -E '(^Repo|^Version)'
master: Repository : storage-update
master: Version : 15.2.16.99+g96ce9b152f5-150200.3.37.1
master: ++ ceph-salt --version
master: ceph-salt 15.2.19+1649909331.ge2933b3
master: ++ stdbuf -o0 ceph-salt -ldebug apply --non-interactive
master: Syncing minions with the master...
master: Checking if minions respond to ping...
master: Pinging 1 minions...
master: Checking if ceph-salt formula is available...
master: Checking if minions have functioning DNS...
master: Running DNS lookups on 1 minions...
master: Checking if there is an existing Ceph cluster...
master: No Ceph cluster deployed yet
master: Installing python3-ntplib on master.ses7-mini.test...
master: Probing external time server pool.ntp.org (attempt 1 of 10)...
master: Checking for FQDN environment on 1 minions...
master: All 1 minions have non-FQDN environment. Good.
master: Resetting execution grains...
master: Starting...
master: Starting the execution of: salt -G 'ceph-salt:member' state.apply ceph-salt
master:
master:
master: Finished execution of ceph-salt formula
master:
master: Summary: Total=1 Succeeded=0 Warnings=0 Failed=1
master: "ceph-salt apply" exit code: 0
master: ++ echo '"ceph-salt apply" exit code: 0'
master: ++ set +x
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: Warning: Permanently added 'master.ses7-mini.test' (ECDSA) to the list of known hosts.
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
master: ...
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
master: ...
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
master: ...
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh master.ses7-mini.test cephadm ls
master: MONs in cluster (actual/expected): 1/1 (3020 seconds to timeout)
master: MGRs in cluster (actual/expected): 1/1 (3020 seconds to timeout)
master: +++ set +x
master: ++ ceph status
master: 2023-06-20T15:18:13.533+0200 7f596648b700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
master: 2023-06-20T15:18:13.533+0200 7f596648b700 -1 AuthRegistry(0x7f596005e778) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 AuthRegistry(0x7f596648a060) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
master: 2023-06-20T15:18:13.541+0200 7f5965489700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
master: [errno 13] RADOS permission denied (error connecting to the cluster)
master: +++ err_report 635
master: +++ local hn
master: +++ set +x
master: Error in provisioner script trapped!
master: => hostname: master
master: => script: /tmp/vagrant-shell
master: => line number: 635
master: Bailing out!
Command '['vagrant', 'up', '--no-destroy-on-error', '--provision']' failed: ret=1 stderr:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
master: Populating default values...
master: Adding master.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/cephadm add master.ses7p-default.test
master: Adding master.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/admin add master.ses7p-default.test
master: Adding master.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/minions add node1.ses7p-default.test
master: Adding node1.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node1.ses7p-default.test
master: Adding node1.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/minions add node2.ses7p-default.test
master: Adding node2.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node2.ses7p-default.test
master: Adding node2.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/minions add node3.ses7p-default.test
master: Adding node3.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node3.ses7p-default.test
master: Adding node3.ses7p-default.test...
master: 1 minion added.
master: ++ ceph-salt config /ceph_cluster/roles/bootstrap set node1.ses7p-default.test
master: Value set.
master: ++ ceph-salt config /cephadm_bootstrap/mon_ip set 10.20.83.201
master: Value set.
master: ++ ceph-salt config /ssh/ generate
master: Key pair generated.
master: ++ ceph-salt config /time_server/servers add master.ses7p-default.test
master: Value added.
master: ++ ceph-salt config /time_server/external_servers add pool.ntp.org
master: Value added.
master: ++ ceph-salt config /time_server/subnet set 10.20.83.0/24
master: Value set.
master: ++ ceph-salt config /cephadm_bootstrap/ceph_image_path set registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph
master: Value set.
master: ++ ceph-salt config /cephadm_bootstrap/dashboard/username set admin
master: Value set.
master: ++ ceph-salt config /cephadm_bootstrap/dashboard/password set admin
master: Value set.
master: ++ ceph-salt config /cephadm_bootstrap/dashboard/force_password_update disable
master: Disabled.
master: ++ ceph-salt config ls
master: o- / ......................................................................................................................... [...]
master: o- ceph_cluster ............................................................................................................ [...]
master: | o- minions ........................................................................................................ [Minions: 4]
master: | | o- master.ses7p-default.test ................................................................................ [cephadm, admin]
master: | | o- node1.ses7p-default.test ............................................................................. [bootstrap, cephadm]
master: | | o- node2.ses7p-default.test ........................................................................................ [cephadm]
master: | | o- node3.ses7p-default.test ........................................................................................ [cephadm]
master: | o- roles ................................................................................................................. [...]
master: | o- admin ........................................................................................................ [Minions: 1]
master: | | o- master.ses7p-default.test ........................................................................ [Other roles: cephadm]
master: | o- bootstrap ...................................................................................... [node1.ses7p-default.test]
master: | o- cephadm ...................................................................................................... [Minions: 4]
master: | | o- master.ses7p-default.test .......................................................................... [Other roles: admin]
master: | | o- node1.ses7p-default.test ....................................................................... [Other roles: bootstrap]
master: | | o- node2.ses7p-default.test ............................................................................... [No other roles]
master: | | o- node3.ses7p-default.test ............................................................................... [No other roles]
master: | o- tuned ............................................................................................................... [...]
master: | o- latency .................................................................................................... [no minions]
master: | o- throughput ................................................................................................. [no minions]
master: o- cephadm_bootstrap ....................................................................................................... [...]
master: | o- advanced .............................................................................................................. [...]
master: | o- ceph_conf ............................................................................................................. [...]
master: | o- ceph_image_path ................................... [registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph]
master: | o- dashboard ............................................................................................................. [...]
master: | | o- force_password_update .......................................................................................... [disabled]
master: | | o- password .......................................................................................................... [admin]
master: | | o- ssl_certificate ................................................................................................. [not set]
master: | | o- ssl_certificate_key ............................................................................................. [not set]
master: | | o- username .......................................................................................................... [admin]
master: | o- mon_ip ....................................................................................................... [10.20.83.201]
master: o- containers .............................................................................................................. [...]
master: | o- registries_conf ................................................................................................... [enabled]
master: | | o- registries ........................................................................................................ [empty]
master: | o- registry_auth ......................................................................................................... [...]
master: | o- password ........................................................................................................ [not set]
master: | o- registry ........................................................................................................ [not set]
master: | o- username ........................................................................................................ [not set]
master: o- ssh ............................................................................................................ [Key Pair set]
master: | o- private_key ............................................................... [36:de:a7:d5:d8:ea:30:7b:fe:63:5b:a9:45:83:23:dc]
master: | o- public_key ................................................................ [36:de:a7:d5:d8:ea:30:7b:fe:63:5b:a9:45:83:23:dc]
master: o- time_server ......................................................................................................... [enabled]
master: o- external_servers ........................................................................................................ [1]
master: | o- pool.ntp.org ........................................................................................................ [...]
master: o- servers ................................................................................................................. [1]
master: | o- master.ses7p-default.test ........................................................................................... [...]
master: o- subnet ...................................................................................................... [10.20.83.0/24]
master: ++ ceph-salt export --pretty
master: {
master: "bootstrap_minion": "node1.ses7p-default.test",
master: "bootstrap_mon_ip": "10.20.83.201",
master: "container": {
master: "images": {
master: "ceph": "registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph"
master: },
master: "registries_enabled": true
master: },
master: "dashboard": {
master: "password": "admin",
master: "password_update_required": false,
master: "username": "admin"
master: },
master: "minions": {
master: "admin": [
master: "master.ses7p-default.test"
master: ],
master: "all": [
master: "node1.ses7p-default.test",
master: "master.ses7p-default.test",
master: "node2.ses7p-default.test",
master: "node3.ses7p-default.test"
master: ],
master: "cephadm": [
master: "node1.ses7p-default.test",
master: "master.ses7p-default.test",
master: "node2.ses7p-default.test",
master: "node3.ses7p-default.test"
master: ],
master: "latency": [],
master: "throughput": []
master: },
master: "ssh": {
master: "private_key": "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEA2Kw94bCDaxEMkoY5ieTv2t4GK8torc/yp9AqiUH31OB+3nWn\nenM3u2vGrA9PIhSS/65lgUkZTvCLXarWpXqOJiuQxvinLc12MEFhn0ZJW+MamzM2\n93yuQFF3n65TqhPpYtr6Qb76xdjLpEmEFEcc6woohNBRpjI8jnQDXABeGELKpM73\nrx+DYUXOeE8MD+mX2PjvGj4xvfZ+ENUQfNasEd+rLC6m4YZHuQtQndwvSMVxiIri\netwsheCyyvywRfL9zPhAG3S/XFmpJz1CvGOCKiqNVfLcSWEYx0VuYFgKaA27N/40\nN7424z4CgvTHzNLwlACuCbKooEwmFwjmivk+UQIDAQABAoIBABPMxqFiiNnefpp/\nsU+eeAQ1TJVRMt1KTtOCwHZXTMtbYgCYeg/kqkPCZS75Pasgu++pO02hlVJbRTMP\nruqDjOykR8hE9fsHpuyhNudwC/ldgyOKXjQmxMIsJ600CCF3TRkrZ1nddstgZLic\nPrl/J597Z8k+Q63XMqU2aQ2t22tmOQ4gGz8VomjiZ3G3VCZgA352j1HUsouYU8mU\n+DB5d563yezs6pf9cNWqpV6HpLXnRzx9GpIxfDx5GAji8/8+4CTEV4Ei4iCoX75i\n/fQ649x2s/zNUFhPHbEgc3ZEpDndxaZoy1yV4/eYNN8ySeRWdrIf1Ylo8MmVJ6U/\nOAeny1kCgYEA43aVZlDlQO9m6kJq2bPxDvIxXgdwSjH2HJWEbbudD/+9njCVZjst\niDSdrC4/ad1/zzwI6H2Dlv76cY6EY+39bFfrgTcwHT2Yy4B/nLkcUexlM13LFZic\n6WPnb+DA4rhzKcHrua0AWSwpvCnckhV1pcNaZE5U0iBzFl3Bppn3hRkCgYEA89sa\nln6pKgJlsrNDo1gUTcjssKw2nwrqC7SHtSdpxQN1gueBDrn/ZS9rDKCDYAordlAF\nDItxF1kdeLdGLaIBh/YnXsBuMUIpuG9pg+nXGXFNwBxLe+wwsLSgW9Xfv6BCEUIo\n3a+CJzIvdRFwsKtlegYupV5lJ7BfUQ3tJ/+XMfkCgYEA3lHODjXtDL2xMi/+bZAR\ngVE47TWKDAqvCRseV35zMer9Mzs7GrOmeiUrItoFAv0Kacu8zTe4QQIwWIM6ZM18\nz8NTHHWLYlkNGYIbuFu5EV1jQIRg9Ve3reoGj/P1suMjNGIketNbrsyach3cRzAQ\nUBcTJ0zkXIh41BiJKMP+CCkCgYEArFB5OzsJgnvrLRlrhDMrNcPzLOykNEJcHCVX\nd/T/0o2dLgE0uxlHlVKqjGOoMec9yv7EcpbeNSdtoe2wE3LVLiQMsfG8a+Za4M8p\nemN08a+Ux1m3JTxDM7qPThWVZC10QgnEItJwYA4gZtMKFG0o6c8Qix5m0GLbF8WF\nfawoRNECgYAcgzUDaisvnn5kGpVttqyov77Qvk8Y86Pc+Xa96TDYYWs8ndFxjVC8\nY596EgD1wP8UqA3JnDNTLeSYJ/KJm1VUFEdIomTG2dlbX5IvpO5dPmpLXxmXmtbB\nXB2rYZ1yUBnb7M7wxV7YtBD4I6voFbSxMBi2OowMNz4uMtiqBemVcg==\n-----END RSA PRIVATE KEY-----",
master: "public_key": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYrD3hsINrEQyShjmJ5O/a3gYry2itz/Kn0CqJQffU4H7edad6cze7a8asD08iFJL/rmWBSRlO8Itdqtaleo4mK5DG+KctzXYwQWGfRklb4xqbMzb3fK5AUXefrlOqE+li2vpBvvrF2MukSYQURxzrCiiE0FGmMjyOdANcAF4YQsqkzvevH4NhRc54TwwP6ZfY+O8aPjG99n4Q1RB81qwR36ssLqbhhke5C1Cd3C9IxXGIiuJ63CyF4LLK/LBF8v3M+EAbdL9cWaknPUK8Y4IqKo1V8txJYRjHRW5gWApoDbs3/jQ3vjbjPgKC9MfM0vCUAK4JsqigTCYXCOaK+T5R"
master: },
master: "time_server": {
master: "enabled": true,
master: "external_time_servers": [
master: "pool.ntp.org"
master: ],
master: "server_hosts": [
master: "master.ses7p-default.test"
master: ],
master: "subnet": "10.20.83.0/24"
master: }
master: }
master: ++ ceph-salt status
master: Cluster: 4 minions, 0 hosts managed by cephadm
master: OS: SUSE Linux Enterprise Server 15 SP3
master: Ceph RPMs: Not installed
master: Config: OK
master: ++ zypper repos --details
master: # | Alias | Name | Enabled | GPG Check | Refresh | Priority | Type | URI | Service
master: ---+--------------------+-------------------------------------------+---------+-----------+---------+----------+--------+---------------------------------------------------------------------------------------------------------------------------+--------
master: 1 | SUSE_CA | SUSE Internal CA Certificate (SLE_15_SP3) | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.suse.de/ibs/SUSE:/CA/SLE_15_SP3/ |
master: 2 | base | base | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Basesystem/15-SP3/x86_64/product/ |
master: 3 | devel-repo-1 | devel-repo-1 | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/Devel:/Storage:/7.0:/Pacific/images/repo/SUSE-Enterprise-Storage-7.1-POOL-x86_64-Media1/ |
master: 4 | product | product | Yes | (r ) Yes | Yes | 99 | rpm-md | http://dist.suse.de/ibs/SUSE/Products/SLE-Product-SLES/15-SP3/x86_64/product/ |
master: 5 | product-update | product-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://dist.suse.de/ibs/SUSE/Updates/SLE-Product-SLES/15-SP3/x86_64/update/ |
master: 6 | server-apps | server-apps | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Server-Applications/15-SP3/x86_64/product/ |
master: 7 | server-apps-update | server-apps-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Server-Applications/15-SP3/x86_64/update/ |
master: 8 | storage | storage | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/Storage/7.1/x86_64/product/ |
master: 9 | storage-update | storage-update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/Storage/7.1/x86_64/update/ |
master: 10 | update | update | Yes | (r ) Yes | Yes | 99 | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Basesystem/15-SP3/x86_64/update/ |
master: ++ grep -E '(^Repo|^Version)'
master: ++ zypper info cephadm
master: Repository : storage-update
master: Version : 16.2.13.66+g54799ee0666-150300.3.11.1
master: ++ ceph-salt --version
master: ceph-salt 16.2.4+1671578301.g6193518
master: ++ stdbuf -o0 ceph-salt -ldebug apply --non-interactive
master: Syncing minions with the master...
master: Checking if minions respond to ping...
master: Pinging 4 minions...
master: Checking if ceph-salt formula is available...
master: Checking if minions have functioning DNS...
master: Running DNS lookups on 4 minions...
master: Checking if there is an existing Ceph cluster...
master: No Ceph cluster deployed yet
master: Installing python3-ntplib on master.ses7p-default.test...
master: Probing external time server pool.ntp.org (attempt 1 of 10)...
master: Checking for FQDN environment on 4 minions...
master: All 4 minions have non-FQDN environment. Good.
master: Resetting execution grains...
master: Starting...
master: Starting the execution of: salt -G 'ceph-salt:member' state.apply ceph-salt
master:
master:
master: Finished execution of ceph-salt formula
master:
master: Summary: Total=4 Succeeded=0 Warnings=0 Failed=4
master: "ceph-salt apply" exit code: 0
master: ++ echo '"ceph-salt apply" exit code: 0'
master: ++ set +x
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: Warning: Permanently added 'node1.ses7p-default.test' (ECDSA) to the list of known hosts.
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
master: ...
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
master: ...
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
master: ...
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 0/1 (3020 seconds to timeout)
master: MGRs in cluster (actual/expected): 0/1 (3020 seconds to timeout)
master: ...
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
master: +++ set +x
master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
master: +++ ssh node1.ses7p-default.test cephadm ls
master: +++ set +x
master: MONs in cluster (actual/expected): 1/1 (2990 seconds to timeout)
master: MGRs in cluster (actual/expected): 1/1 (2990 seconds to timeout)
master: ++ ceph status
master: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
master: +++ err_report 659
master: +++ local hn
master: +++ set +x
master: Error in provisioner script trapped!
master: => hostname: master
master: => script: /tmp/vagrant-shell
master: => line number: 659
master: Bailing out!
Command '['vagrant', 'up', '--no-destroy-on-error', '--provision']' failed: ret=1 stderr:
==> master: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
This looks like the same thing we've hit intermittently inside sesdev CI, notably the most recent failure of https://github.com/SUSE/sesdev/pull/696 (output at http://see.prv.suse.net:8080/blue/organizations/jenkins/sesdev-integration/detail/PR-696/1/pipeline). The notes in #689 and #691 are also relevant.
A couple of things stand out to me here in the above output:
master: Finished execution of ceph-salt formula
master:
master: Summary: Total=1 Succeeded=0 Warnings=0 Failed=1
master: "ceph-salt apply" exit code: 0
master: Finished execution of ceph-salt formula
master:
master: Summary: Total=4 Succeeded=0 Warnings=0 Failed=4
master: "ceph-salt apply" exit code: 0
That summary information is produced by ceph-salt at https://github.com/ceph/ceph-salt/blob/619351846592062c245e22555fea399d8f3d5c02/ceph_salt/execute.py#L1288. The counters indicate the number of minions on which salt -G 'ceph-salt:member' state.apply ceph-salt succeeded. So in both your test runs, that state.apply failed on all the minions.
Given that all the minions failed, surely it's incorrect for ceph-salt apply to give us exit code 0 indicating success! Also, why is ceph-salt not providing any diagnostic information about what exactly failed on the minions? So I reckon that's two ceph-salt bugs right there.
As for the subsequent failures running ceph status, the first ("auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory") will be due to /tmp/ceph.client.admin/keyring not having been copied to /etc/ceph/ yet. ceph-salt is meant to copy that file. The second ("Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)" is presumably a missing /etc/ceph/ceph.conf, which should have been created by cephadm bootstrap.
Honestly, it feels to me like what's happening here is ceph-salt invokes cephadm bootstrap but somehow execution returns to ceph-salt before cephadm bootstrap has actually completed. At least, looking at the logs I have in http://see.prv.suse.net:8080/blue/organizations/jenkins/sesdev-integration/detail/PR-696/1/artifacts, I can see there's still stuff happening in cephadm.out after ceph status fails. I don't really know how that's possible, but that's my current working hunch.
@votdev can you please try something for me? Remove any of my experimental patches applied to sesdev, then rerun sesdev create, but add the --salt parameter. This will force sesdev to just directly invoke salt state.apply to apply the ceph-salt formula, rather than running everything through the ceph-salt executor. For example:
sesdev create ses7p --salt --non-interactive ses7p-default
Does that give you a successful deployment?