cloudstack
cloudstack copied to clipboard
VR: remove old json config when start vmware/xenserver VPC VRs
Description
This PR fixes the issue that the IPs are associated to wrong interfaces when reboot a VR on vmware/xen environment. However, VR will be broken if it is rebooted in vCenter or XenCenter (not in cloudstack).
steps to reproduce the issue (1) create vpc, and a vpc tier (2) acquire an ip in additional ip range, enable static nat or pf/lb (3) reboot VR in cloudstack
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [x] Major
- [ ] Minor
- [ ] Trivial
Screenshots (if appropriate):
How Has This Been Tested?
@blueorangutan package
@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.
Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 2465
@sureshanaparti should this be considered on 4.16.1?
@blueorangutan test centos7 xcpng82
@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + xcpng82) has been kicked to run smoke tests
@weizhouapache doesn't this beat the whole idea of persistent configs in VRs? cc @sureshanaparti @DaanHoogland
@weizhouapache doesn't this beat the whole idea of persistent configs in VRs? cc @sureshanaparti @DaanHoogland
@rohityadavcloud
each time when VR is started from cloudstack, the config files are regenerated. so it is not required in this scenario. I understand persistent config is helpful when reboot VR in vcenter or xencenter. However, it breaks VPC VR when reboot it in cloudstack in many scenarios.
ps: this behaviour has been aplied for vpc vrs on kvm hosts, where there is no centrailized management other than cloudstack.
@blueorangutan test centos7 vmware-7u2
@DaanHoogland unsupported parameters provided. Supported mgmt server os are: suse15, centos7, centos6, alma8, ubuntu18, ubuntu20, rocky8. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-rocky8, kvm-alma8, kvm-ubuntu18, kvm-ubuntu20, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, vmware-70u2, vmware-70u3, xenserver-65sp1, xenserver-71, xenserver-74, xcpng74, xcpng76, xcpng80, xcpng81, xcpng82
@blueorangutan test centos7 vmware-70u2
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + vmware-70u2) has been kicked to run smoke tests
Trillian test result (tid-3196) Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48954 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5938-t3196-xcpng82.zip Smoke tests completed. 91 look OK, 1 have errors Only failed tests results shown below:
| Test | Result | Time (s) | Test File |
|---|---|---|---|
| test_01_sys_vm_start | Failure |
0.10 | test_secondary_storage.py |
Trillian test result (tid-3212) Environment: vmware-70u2 (x2), Advanced Networking with Mgmt server 7 Total time taken: 35443 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5938-t3212-vmware-70u2.zip Smoke tests completed. 92 look OK, 0 have errors Only failed tests results shown below:
| Test | Result | Time (s) | Test File |
|---|
Trillian test result (tid-3196) Environment: xcpng82 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48954 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5938-t3196-xcpng82.zip Smoke tests completed. 91 look OK, 1 have errors Only failed tests results shown below: Test Result Time (s) Test File test_01_sys_vm_start
Failure0.10 test_secondary_storage.py
@weizhouapache is this expected/intermittent? can you have a look?
test_secondary_storage.py
@DaanHoogland It should not be related to this PR. I have seen it some times before. I will have a look
@DaanHoogland @rohityadavcloud @sureshanaparti this has conflicts with persistent config which is useful when VR is rebooted from out-of-band (e.g. vcenter, or command inside VR). However, the nics of VPC VR is always plugged in the following order when VPC VR is started in cloudstack (1) Source nat IP (2) additional public IPs (3) private gateway (4) vpc tiers This order is sometimes different from the IPs in json files inside the VR. This happens in many scenarios, for example (1) public ip in additional range is associated (2) private gateway is created after vpc tier creation (3) remove a vpc tier (not the last vpc tier)
when it happens, IPs will be associated to wrong interfaces when reboot VR from cloudstack.
with this PR, cloudstack can ensure that the order is correct and Ips are associated to correct interfaces. but the VR is rebooted from out-of-band, VR will not work anymore, as json config files are removed in bootstrap.
We need to determine which we should support better (reboot VR from cloudstack, or out-of-band).
a feasible improvement is : remove json file only it is a VPC VR, so network VRs will not be impacted
@sureshanaparti @weizhouapache I think we should investigate more if this can be solved in a non conflicting way for both Cloudstack controlled and out-of-band reboots. I suggest moving this to milestone 4.17
@blueorangutan package
@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.
Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 2833
@blueorangutan test matrix
@weizhouapache a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests
Trillian test result (tid-3561) Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7 Total time taken: 33673 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5938-t3561-xenserver-71.zip Smoke tests completed. 92 look OK, 0 have errors Only failed tests results shown below:
| Test | Result | Time (s) | Test File |
|---|
@weizhouapache is this ready for review or still needs more work?
@nvazquez this code is ready for review and testing. This requires manual testing on rebooting VR from inside it (or out-of-band).
I am working on fixing component tests.
clgtm, seems to do what it says on the tin. One question; for vmware we query and get only one vm and reconfigure it, for Xen we get a list and iterate over it. Is this difference real or just a result of the API definitions, i.e. can there exist more then one on xen?
sorry @DaanHoogland , can you clarify the question ?
the process to pass cmdline to VRs is different on hypervisors.
@blueorangutan package
@weizhouapache a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.