Assessment of cluster-level hypervisor parameters during cluster-verify
This commit implements the assessment of hypervisor parameters set on the cluster-level. It is achieved through a new method of the hypervisor abstraction which gets called for each enabled hypervisor during the gnt-cluster verify run. Currently it is only implemented for the KVM hypervisor and tries to find suboptimal values for e.g. cpu_type.
Findings are printed in a new section of the verify output and are only informational, they do not have any impact on the return code of the command (and can be turned off with a new option).
This feature is meant to uncover parameter values which might have become deprecated over time but have been preserved during cluster upgrades. It may also help new users to uncover potential 'bad' configuration, such as running instances with the 'qemu64' CPU type.
This feature is not meant to interfere with existing syntax validations of hypervisor parameters, as these are applied at different stages (e.g. during cluster verify but also when altering parameters on cluster/instance level).
As of now, only cluster-level settings are assessed to not clutter the output of verify on larger clusters. It could also be easily adopted to realise something like gnt-instance assess $instance to manually check the parameters for any given instance.
This addresses #1606, but actually goes much further as it implements the basis to asses any parameter.
Looking forward to your feedback and ideas for more parameters/values to check :-)
btw, this is an example output:
gnt-cluster verify
Submitted jobs 186, 187
Waiting for job 186 ...
Tue Sep 7 14:33:03 2021 * Verifying cluster config
Tue Sep 7 14:33:03 2021 * Verifying cluster certificate files
Tue Sep 7 14:33:03 2021 * Verifying hypervisor parameters
Tue Sep 7 14:33:03 2021 * Verifying all nodes belong to an existing group
Waiting for job 187 ...
Tue Sep 7 14:33:04 2021 * Verifying group 'default'
Tue Sep 7 14:33:04 2021 * Gathering data (3 nodes)
Tue Sep 7 14:33:04 2021 * Gathering information about nodes (3 nodes)
Tue Sep 7 14:33:04 2021 * Gathering disk information (3 nodes)
Tue Sep 7 14:33:04 2021 * Verifying configuration file consistency
Tue Sep 7 14:33:04 2021 * Verifying node status
Tue Sep 7 14:33:04 2021 * Verifying instance status
Tue Sep 7 14:33:04 2021 * Verifying orphan volumes
Tue Sep 7 14:33:04 2021 * Verifying N+1 Memory redundancy
Tue Sep 7 14:33:04 2021 * Assessing cluster hypervisor parameters
Tue Sep 7 14:33:04 2021 - cpu_type is currently set to 'host', please make sure all your cluster nodes have the exact same CPU type to allow live migrations.
Tue Sep 7 14:33:04 2021 - Spice is configured but without TLS, please consider setting spice_use_tls to 'true' for additional security.
Tue Sep 7 14:33:04 2021 * Other Notes
Tue Sep 7 14:33:05 2021 * Hooks Results
I think I will move forward with this and close this PR (CC @apoikos @saschalucas). There has not been much feedback/activity here and it does not help to pile up PRs :-)
Looking at the mailing list the default cpu_type seems to be an issue very now and then and we really should improve the situation so that people stop running into problems there.