framework Create MDS unittest for specific scenario

Create MDS unittest for specific scenario

Open kvanhijf opened this issue 7 years ago • 12 comments

We assume the following scenario can result in a bad behavior when running the 'ensure_safety' check for a vdisk

Safety is configured on 2
Volume V has master on node1, slave on node2
Master node dies, HA kicks in, volume gets moved by volumedriver to node2
Volumedriver sends owner_changed event and fwk runs ensure_safety for said volume
The logging indicated 2 reasons for reconfiguration and an error:
- Not enough safety
- Not enough services in use in primary domain
- Failed to update the metadata backend configuration
Framework eventually configured node3 to be master and node4 to be slave, removing node2 from the config altogether

            try:
                if len(configs_no_ex_master) != len(configs_all):
                    vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                              metadata_backend_config=MDSMetaDataBackendConfig(configs_no_ex_master),
                                                                              req_timeout_secs=5)
                vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                          metadata_backend_config=MDSMetaDataBackendConfig(configs_all),
                                                                          req_timeout_secs=5)
            except Exception:
                MDSServiceController._logger.exception('MDS safety: vDisk {0}: Failed to update the metadata backend configuration'.format(vdisk.guid))
                raise Exception('MDS configuration for volume {0} with guid {1} could not be changed'.format(vdisk.name, vdisk.guid))

We assume that the 1st update_metadata_backend_config was initiated and timed out after 5 seconds, thus the 2nd update was not executed, which presumeably contained the original master node (node2). But this cannot be verified from the logging available On voldrv side we did see the actual updateMetadataBackendConfig succeeded, but took 386s for it to complete

Aug 16 '17 14:08 kvanhijf

framework framework copied to clipboard

Create MDS unittest for specific scenario

framework
framework copied to clipboard