xcp
xcp copied to clipboard
(Ceph) Unable activate HA of xcp cluster with network storage with feature VDI_ATTACH_OFFLINE
If common storage of XCP cluster implements feature VDI_ATTACH_OFFLINE, HA cannot be activated. It happens because of bug in script /opt/xensource/bin/static-vdis.
if feature VDI_ATTACH_OFFLINE is declared in the storage plugin, in function add(session, vdi_uuid, reason) in line 106 of the script, it tries to execute command sr = sr_attach(ty, device_config) (in line 152) to attach vdi with metadata required for HA. But variable device_config is not initialized properly before call. It has no keys required by the storage plugin API for attaching vdi.
Sample of trackback:
#012 File "/usr/libexec/xapi-storage-script/volume/org.xen.xapi.storage.rbdsr/SR.attach", line 178, in attach
#012 configuration['sr_uuid'])#012KeyError: 'sr_uuid'#012Traceback (most recent call last):
#012 File "/usr/libexec/xapi-storage-script/volume/org.xen.xapi.storage.rbdsr/SR.attach", line 343, in <module>
#012 cmd.attach()
#012 File "/usr/lib/python2.7/site-packages/xapi/storage/api/v4/volume.py", line 1373, in attach
#012 raise e#012KeyError: 'sr_uuid'#012Traceback (most recent call last):
#012 File "/opt/xensource/bin/static-vdis", line 323, in <module>
#012 add(session, sys.argv[2], sys.argv[3])
#012 File "/opt/xensource/bin/static-vdis", line 152, in add
#012 sr = sr_attach(ty, device_config)
I'm not sure to get it. Is it related to Ceph thing? If yes, you should probably make a bug report in the Ceph repo.
Yes, it's ceph related. I've already posted issue there before posting here: https://github.com/rposudnevskiy/RBDSR/issues/94
But I did't find any issue on side of the ceph plugin. Physical Block Device's device_config doesn't provide sr_uuid record and, as I understand, shouldn't. But presence of valid sr_uuid key with value obviously required to use in call sr_attach(ty, device_config).
Does HA works on another shared SR configuration? (eg just using NFS?)
I cannot test it with NFS now. Some time later I maybe will test it with iSCSI. But logically, in context of storage plugins there is no way to get what SR should be attached if you have several SRs served by the same plugin. Obviously it should be defined explicitly somehow outside of plugins.
Logically everything should work, so I prefer to do differential diagnostic, this is almost always the fastest way to pinpoint the issue ;)
I agree. But I have pretty limited resources to test all as it should be tested. :((
Any small NFS share can do the trick, even in a VM itself for the sake of testing :)
@rushikeshjadhav could you have a look to see if this bug report is still relevant to XCP-ng 8.2?
Tested it for XCP-ng 8.2 and found to be working fine. It takes around 31s to activate HA on CephFS type of SR and around 3s to disable it.
Thanks. For Ceph RBD our docs say "Known issue: this SR is not allowed to be used for HA state metadata due to LVM backend restrictions within XAPI drivers, so if you want to use HA, you will need to create another type of storage for HA metadata". I suppose this is still true?
Yes, Ceph RBD mounted as LVM SR can not be used as heartbeat SR.