xcp icon indicating copy to clipboard operation
xcp copied to clipboard

(Ceph) Unable activate HA of xcp cluster with network storage with feature VDI_ATTACH_OFFLINE

Open northbear opened this issue 7 years ago • 11 comments

If common storage of XCP cluster implements feature VDI_ATTACH_OFFLINE, HA cannot be activated. It happens because of bug in script /opt/xensource/bin/static-vdis. if feature VDI_ATTACH_OFFLINE is declared in the storage plugin, in function add(session, vdi_uuid, reason) in line 106 of the script, it tries to execute command sr = sr_attach(ty, device_config) (in line 152) to attach vdi with metadata required for HA. But variable device_config is not initialized properly before call. It has no keys required by the storage plugin API for attaching vdi.

Sample of trackback:

#012  File "/usr/libexec/xapi-storage-script/volume/org.xen.xapi.storage.rbdsr/SR.attach", line 178, in attach
#012    configuration['sr_uuid'])#012KeyError: 'sr_uuid'#012Traceback (most recent call last):
#012  File "/usr/libexec/xapi-storage-script/volume/org.xen.xapi.storage.rbdsr/SR.attach", line 343, in <module>
#012    cmd.attach()
#012  File "/usr/lib/python2.7/site-packages/xapi/storage/api/v4/volume.py", line 1373, in attach
#012    raise e#012KeyError: 'sr_uuid'#012Traceback (most recent call last):
#012  File "/opt/xensource/bin/static-vdis", line 323, in <module>
#012    add(session, sys.argv[2], sys.argv[3])
#012  File "/opt/xensource/bin/static-vdis", line 152, in add
#012    sr = sr_attach(ty, device_config)

northbear avatar Oct 24 '18 13:10 northbear

I'm not sure to get it. Is it related to Ceph thing? If yes, you should probably make a bug report in the Ceph repo.

olivierlambert avatar Oct 24 '18 18:10 olivierlambert

Yes, it's ceph related. I've already posted issue there before posting here: https://github.com/rposudnevskiy/RBDSR/issues/94 But I did't find any issue on side of the ceph plugin. Physical Block Device's device_config doesn't provide sr_uuid record and, as I understand, shouldn't. But presence of valid sr_uuid key with value obviously required to use in call sr_attach(ty, device_config).

northbear avatar Oct 25 '18 08:10 northbear

Does HA works on another shared SR configuration? (eg just using NFS?)

olivierlambert avatar Oct 25 '18 08:10 olivierlambert

I cannot test it with NFS now. Some time later I maybe will test it with iSCSI. But logically, in context of storage plugins there is no way to get what SR should be attached if you have several SRs served by the same plugin. Obviously it should be defined explicitly somehow outside of plugins.

northbear avatar Oct 25 '18 10:10 northbear

Logically everything should work, so I prefer to do differential diagnostic, this is almost always the fastest way to pinpoint the issue ;)

olivierlambert avatar Oct 25 '18 10:10 olivierlambert

I agree. But I have pretty limited resources to test all as it should be tested. :((

northbear avatar Oct 25 '18 12:10 northbear

Any small NFS share can do the trick, even in a VM itself for the sake of testing :)

olivierlambert avatar Oct 25 '18 12:10 olivierlambert

@rushikeshjadhav could you have a look to see if this bug report is still relevant to XCP-ng 8.2?

stormi avatar Nov 30 '20 14:11 stormi

Tested it for XCP-ng 8.2 and found to be working fine. It takes around 31s to activate HA on CephFS type of SR and around 3s to disable it.

rushikeshjadhav avatar Dec 10 '20 09:12 rushikeshjadhav

Thanks. For Ceph RBD our docs say "Known issue: this SR is not allowed to be used for HA state metadata due to LVM backend restrictions within XAPI drivers, so if you want to use HA, you will need to create another type of storage for HA metadata". I suppose this is still true?

stormi avatar Dec 10 '20 10:12 stormi

Yes, Ceph RBD mounted as LVM SR can not be used as heartbeat SR.

rushikeshjadhav avatar Dec 10 '20 10:12 rushikeshjadhav