DeepSea
DeepSea copied to clipboard
ceph.restart orchestration could check if all needed roles are deployed and show improved error message if any roles are missing
DeepSea master branch, tip is 12965311c0c6f4ad69e5854f498907df7f9d1cea
On a single-node SLE15/SES6 cluster with 4 external drives, I run Stages 0-3, get HEALTH_OK.
Then I do salt-run state.orch ceph.smoketests
and this fails with the following error blob:
target147135133120.teuthology_master:
Data failed to compile:
----------
Rendering SLS 'base:ceph.smoketests.restart.mds' failed: mapping values are not allowed here; line 9
---
[...]
reset systemctl initially for mds:
salt.state:
- tgt: Exception occurred in runner select.one_minion: Traceback (most recent call last): <======================
File "/usr/lib/python3.6/site-packages/salt/client/mixins.py", line 387, in _low
data['return'] = self.functions[fun](*args, **kwargs)
File "/srv/modules/runners/select.py", line 96, in one_minion
return ret[0]
IndexError: list index out of range
[...]
---
----------
Rendering SLS 'base:ceph.smoketests.restart.rgw' failed: mapping values are not allowed here; line 9
---
[...]
reset systemctl initially for rgw:
salt.state:
- tgt: Exception occurred in runner select.one_minion: Traceback (most recent call last): <======================
File "/usr/lib/python3.6/site-packages/salt/client/mixins.py", line 387, in _low
data['return'] = self.functions[fun](*args, **kwargs)
File "/srv/modules/runners/select.py", line 96, in one_minion
return ret[0]
IndexError: list index out of range
[...]
---
@jschmid1 tells me this is because the ceph.restart
orchestration requires a cluster with mds
and rgw
roles deployed.
The above error occurs when these roles are absent.
Right, because the smoketests do not implement an additional check if the roles are actually implemented.
It's worth a discussion if they should actually do
I think we solved that by re-implementing the way we run those restart tests.
Yes, the CI can now run these tests, but reopening the issue to track the problematic error handling.
This could be resolved by implementing a validate runner for the functests and triggering it in init.sls (similar to how it is triggered by the stage orchestrations).