DeepSea
DeepSea copied to clipboard
CLI reports wrong number of steps for Stage 3
4-node cluster like so:
roles:
- - client.salt_master
- mon.a
- mgr.x
- osd.0
- - mon.b
- mgr.y
- osd.1
- - mon.c
- mgr.z
- osd.2
- - osd.3
Stage 3 completes, but reports the wrong number of steps:
...
[49/50] Executing state ceph.processes.osd...
|_ target192168000053.teuthology: wait for osd processes(cephprocesses.wait) ok
target192168000053.teuthology... ok
|_ target192168000059.teuthology: wait for osd processes(cephprocesses.wait) ok
target192168000059.teuthology... ok
|_ target192168000045.teuthology: wait for osd processes(cephprocesses.wait) ok
target192168000045.teuthology... ok
|_ target192168000054.teuthology: wait for osd processes(cephprocesses.wait) ok
target192168000054.teuthology... ok
[init] Executing state ceph.osd.restart... in target192168000045.teuthology... ok
Finished stage ceph.stage.3: succeeded=49/50 failed=0/50
The problem appears to be with the very last step (Step 50 in this case) - in particular, this line is anomalous:
[init] Executing state ceph.osd.restart... in target192168000045.teuthology... ok
and the expected [50/50] Executing state ceph.osd.restart... skipped
does not appear at all.
The entire output is here: https://paste2.org/vBnxFY4V
Reproducer
SUITE="deepsea/tier3/stage-5"; SUITE_REPO="https://github.com/SUSE/ceph.git"; SUITE_BRANCH="wip-qa-rbd"; FILTER_OUT="opensuse"; NAME="$USER"; teuthology-openstack --verbose --ceph-repo https://github.com/SUSE/ceph.git --ceph $SUITE_BRANCH --suite-repo $SUITE_REPO --suite-branch $SUITE_BRANCH --filter-out $FILTER_OUT --key-filename /home/$USER/.ssh/storage-automation --key-name storage-automation --name $NAME --suite $SUITE --test-repo 'ses:http://10.86.0.120/artifacts/ci/d4ad635e5108134c42c5e5e477a4224a320ca2a57e0795865712f7f2f7ebf1ec/SUSE-Enterprise-Storage-6-POOL-x86_64-Build5.6' --test-repo 'basesystem:http://10.86.0.120/artifacts/ci/e9bf06179ab8222426cd8b22e7e5d63f40d2b12f67d6a47a4898983f5735ccdc/SLE-15-Module-Basesystem-POOL-x86_64-Build668.1' --test-repo 'server:http://10.86.0.120/artifacts/ci/618743a8bcd28a64b2881bd11eaccb24f17fe887576e109f59a64e46f9564f0a/SLE-15-Module-Server-Applications-POOL-x86_64-Build668.1' --test-repo 'internal:http://10.86.0.120/artifacts/ci/55d6cc29675bc77b0b44bd3063b830547a1f271e8e42c96e4729acf1dadb0209/SUSE-Enterprise-Storage-6-POOL-Internal-x86_64-Build5.2' --test-repo 'update:http://10.86.0.120/artifacts/ci/snapshot/201811060915/3577b41f35e5000f7bd8aef985ed8b32cbeb766d07cb7efbdbb8234e935d3789/SLE-15-Update' ~/custom_yaml/wip-explicit-python3.yaml
Versions Report
(Provided by running:
salt-run deepsea.version
rpm -qi salt-minion
rpm -qi salt-master
)
It happens on master (0.9.8).
Interestingly, the behavior (in the same test) is not always the same. I just saw the stage end normally:
2018-11-11T13:14:10.325 INFO:teuthology.orchestra.run.target192168000069.stdout:[init] Executing state ceph.osd.restart... in target192168000013.teuthology... ok
2018-11-11T13:14:10.497 INFO:teuthology.orchestra.run.target192168000069.stdout:[47/50] Executing state ceph.osd.restart... skipped
2018-11-11T13:14:10.497 INFO:teuthology.orchestra.run.target192168000069.stdout:[48/50] Executing state ceph.wait...
2018-11-11T13:14:17.173 INFO:teuthology.orchestra.run.target192168000069.stdout: |_ target192168000069.teuthology: wait(wait.out) ok
2018-11-11T13:14:17.192 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000069.teuthology... ok
2018-11-11T13:14:17.401 INFO:teuthology.orchestra.run.target192168000069.stdout:[49/50] Executing state ceph.processes.osd...
2018-11-11T13:14:19.687 INFO:teuthology.orchestra.run.target192168000069.stdout: |_ target192168000013.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.701 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000013.teuthology... ok
2018-11-11T13:14:19.854 INFO:teuthology.orchestra.run.target192168000069.stdout: |_ target192168000054.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.891 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000054.teuthology... ok
2018-11-11T13:14:19.926 INFO:teuthology.orchestra.run.target192168000069.stdout: |_ target192168000030.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:19.958 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000030.teuthology... ok
2018-11-11T13:14:20.011 INFO:teuthology.orchestra.run.target192168000069.stdout: |_ target192168000069.teuthology: wait for osd processes(cephprocesses.wait) ok
2018-11-11T13:14:20.119 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000069.teuthology... ok
2018-11-11T13:14:20.339 INFO:teuthology.orchestra.run.target192168000069.stdout:[50/50] Executing state ceph.osd.restart...
2018-11-11T13:14:22.625 INFO:teuthology.orchestra.run.target192168000069.stdout: target192168000069.teuthology... ok
2018-11-11T13:14:22.779 INFO:teuthology.orchestra.run.target192168000069.stdout:Finished stage ceph.stage.3: succeeded=50/50 failed=0/50
However, I wonder whether it's expected for there to be [init] Executing state ceph.osd.restart... in target192168000013.teuthology... ok
lines in the "steps execution" phase. There is already an init phase before the steps execution phase - don't these [init]
lines belong up there?