jetlag
jetlag copied to clipboard
Flexible deployment option
New Feature Request:
- Recognize hardware that fails or likely to fail during the install.
- Continue to deploy on the healthy set of nodes skipping the failed hardware.
- Report which nodes failed to deploy
Large scale cluster deployment get stuck due to some bad nodes (bad hardware or incorrect/unsupported configuration). The installation should be able to skip such nodes for installation.
I would suggest we narrow the scope of this before attempting to take something like this on. Perhaps the first step would be a validation playbook which confirms all hardware in your allocation is valid and has expected network connectivity? On 2nd thought shouldn't the lab only hand out hardware that is 100% operational and the failures we are discussing here are ones that can occur during actual cluster deployment?