DeepSea icon indicating copy to clipboard operation
DeepSea copied to clipboard

CLI reports wrong number of steps for Stage 3

Open smithfarm opened this issue 5 years ago • 1 comments

4-node cluster like so:

  roles:
  - - client.salt_master
    - mon.a
    - mgr.x
    - osd.0
  - - mon.b
    - mgr.y
    - osd.1
  - - mon.c
    - mgr.z
    - osd.2
  - - osd.3

Stage 3 completes, but reports the wrong number of steps:

...
[49/50] Executing state ceph.processes.osd...
         |_  target192168000053.teuthology: wait for osd processes(cephprocesses.wait) ok
         target192168000053.teuthology... ok
         |_  target192168000059.teuthology: wait for osd processes(cephprocesses.wait) ok
         target192168000059.teuthology... ok
         |_  target192168000045.teuthology: wait for osd processes(cephprocesses.wait) ok
         target192168000045.teuthology... ok
         |_  target192168000054.teuthology: wait for osd processes(cephprocesses.wait) ok
         target192168000054.teuthology... ok
[init] Executing state ceph.osd.restart...                in target192168000045.teuthology... ok
Finished stage ceph.stage.3: succeeded=49/50 failed=0/50

The problem appears to be with the very last step (Step 50 in this case) - in particular, this line is anomalous:

[init] Executing state ceph.osd.restart...                in target192168000045.teuthology... ok

and the expected [50/50] Executing state ceph.osd.restart... skipped does not appear at all.

The entire output is here: https://paste2.org/vBnxFY4V

Reproducer

SUITE="deepsea/tier3/stage-5"; SUITE_REPO="https://github.com/SUSE/ceph.git"; SUITE_BRANCH="wip-qa-rbd"; FILTER_OUT="opensuse"; NAME="$USER"; teuthology-openstack --verbose --ceph-repo https://github.com/SUSE/ceph.git --ceph $SUITE_BRANCH --suite-repo $SUITE_REPO --suite-branch $SUITE_BRANCH --filter-out $FILTER_OUT --key-filename /home/$USER/.ssh/storage-automation --key-name storage-automation --name $NAME --suite $SUITE --test-repo 'ses:http://10.86.0.120/artifacts/ci/d4ad635e5108134c42c5e5e477a4224a320ca2a57e0795865712f7f2f7ebf1ec/SUSE-Enterprise-Storage-6-POOL-x86_64-Build5.6' --test-repo 'basesystem:http://10.86.0.120/artifacts/ci/e9bf06179ab8222426cd8b22e7e5d63f40d2b12f67d6a47a4898983f5735ccdc/SLE-15-Module-Basesystem-POOL-x86_64-Build668.1' --test-repo 'server:http://10.86.0.120/artifacts/ci/618743a8bcd28a64b2881bd11eaccb24f17fe887576e109f59a64e46f9564f0a/SLE-15-Module-Server-Applications-POOL-x86_64-Build668.1' --test-repo 'internal:http://10.86.0.120/artifacts/ci/55d6cc29675bc77b0b44bd3063b830547a1f271e8e42c96e4729acf1dadb0209/SUSE-Enterprise-Storage-6-POOL-Internal-x86_64-Build5.2' --test-repo 'update:http://10.86.0.120/artifacts/ci/snapshot/201811060915/3577b41f35e5000f7bd8aef985ed8b32cbeb766d07cb7efbdbb8234e935d3789/SLE-15-Update' ~/custom_yaml/wip-explicit-python3.yaml

Versions Report

(Provided by running: salt-run deepsea.version rpm -qi salt-minion rpm -qi salt-master )

It happens on master (0.9.8).

smithfarm avatar Nov 11 '18 11:11 smithfarm

Interestingly, the behavior (in the same test) is not always the same. I just saw the stage end normally:

2018-11-11T13:14:10.325 INFO:teuthology.orchestra.run.target192168000069.stdout:[init] Executing state ceph.osd.restart...                in target192168000013.teuthology... ok
2018-11-11T13:14:10.497 INFO:teuthology.orchestra.run.target192168000069.stdout:[47/50] Executing state ceph.osd.restart... skipped
2018-11-11T13:14:10.497 INFO:teuthology.orchestra.run.target192168000069.stdout:[48/50] Executing state ceph.wait...
2018-11-11T13:14:17.173 INFO:teuthology.orchestra.run.target192168000069.stdout:         |_  target192168000069.teuthology: wait(wait.out) ok
2018-11-11T13:14:17.192 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000069.teuthology... ok
2018-11-11T13:14:17.401 INFO:teuthology.orchestra.run.target192168000069.stdout:[49/50] Executing state ceph.processes.osd...
2018-11-11T13:14:19.687 INFO:teuthology.orchestra.run.target192168000069.stdout:         |_  target192168000013.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.701 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000013.teuthology... ok
2018-11-11T13:14:19.854 INFO:teuthology.orchestra.run.target192168000069.stdout:         |_  target192168000054.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.891 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000054.teuthology... ok
2018-11-11T13:14:19.926 INFO:teuthology.orchestra.run.target192168000069.stdout:         |_  target192168000030.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.958 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000030.teuthology... ok
2018-11-11T13:14:20.011 INFO:teuthology.orchestra.run.target192168000069.stdout:         |_  target192168000069.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:20.119 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000069.teuthology... ok
2018-11-11T13:14:20.339 INFO:teuthology.orchestra.run.target192168000069.stdout:[50/50] Executing state ceph.osd.restart...
2018-11-11T13:14:22.625 INFO:teuthology.orchestra.run.target192168000069.stdout:         target192168000069.teuthology... ok
2018-11-11T13:14:22.779 INFO:teuthology.orchestra.run.target192168000069.stdout:Finished stage ceph.stage.3: succeeded=50/50 failed=0/50

However, I wonder whether it's expected for there to be [init] Executing state ceph.osd.restart... in target192168000013.teuthology... ok lines in the "steps execution" phase. There is already an init phase before the steps execution phase - don't these [init] lines belong up there?

smithfarm avatar Nov 11 '18 13:11 smithfarm