cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

SOLIDFIRE: "SolidFire" plugin doesn't work for ROOT volumes with VMware (6.5)

Open andrijapanicsb opened this issue 6 years ago • 11 comments
trafficstars

ISSUE TYPE
  • Bug Report
COMPONENT NAME
SolidFire plugin ('SolidFire" as opposed to "SolidFire Shared")
CLOUDSTACK VERSION
4.13 (master atm), but also observed by another community member on 4.11.3
CONFIGURATION
OS / ENVIRONMENT

VMware 6.5 tested

SUMMARY

Adding SolidFire Primary Storage via SolidFire plugin for VMware 6.5 fails with errors in mounting Datastore in ESXi hosts and raises the specific error in mgmt logs.

STEPS TO REPRODUCE

in vCenter, add iSCSI Software adapter to each ESXi hosts, configure proper network binding to vSwitchXXX for the iSCSI traffic (i.e. VLAN XXX so that ESXi can communicate with the SVIP on that VLAN), then add SolidFire (Zone wide (same problem with cluster-wide), protocol "Custom, provider "SolidFire", "Managed" box ticked and proper ULR - adding SF as Primary Storage is successful.

Try to spin a VM - that is when the things fail after a minutes or so.

Observing vCenter, the following thing happen (also check the screenshot)

  • Static iSCSI target is added to ESXI hosts
  • Rescanning HBAs
  • Creating datastore same size as the volume/template itself with the name ending in "Centos-5.3-x64" or similar (name of the template)
  • Deploying OVF template
  • Unregistering VM
  • Moving files around
  • unmounting VMFS
  • Removing iSCSI static targets
  • Rescan HBA
  • Again adding iSCSI static targets ???
  • Rescan HBAs
  • Rescan VMFS
  • RENAME datastore (from the template-alike name to the root-volume-alike name, ending with ROOT-XXX.YY) (... probably NOW problem happens ???)))
  • unmount datastore
  • remove iSCSI targets.

The error from the ACS is: message: Datastore '-iqn.2010-01.com.solidfire:hl1k.root-32.29-0' is not accessible. No connected and accessible host is attached to this datastore

The problem is - this datastore (in it's latest, renamed state, still attached to ESXi hosts) - is unmounted, but can't be removed, NOR can I mount it - If I try to manually mount it, I get the vCenter message of "Operation failed, diagnostics report: Unable to find volume uuid[5d7abd9a-273aa9d5-bffe-1e00d4010711] lvm [snap-329aa3ea-5d7abd01-a5c83210-c87c-1e00d4010711] devices"

Screenshot from vCenter attached - note that the last 2 entries (on top) are my attempt to manually mount an existing SF datastore. - i.e. there are zero failures on vCenter side while ACS is doing it's job - something is failing on ACS side.

image

andrijapanicsb avatar Sep 13 '19 20:09 andrijapanicsb

@skattoju4 /CC @mike-tutkowski FYI ^^^

andrijapanicsb avatar Sep 13 '19 20:09 andrijapanicsb

Small update - it works fine for DATA disks since no renaming of datastore in play.

andrijapanicsb avatar Sep 23 '19 09:09 andrijapanicsb

Seems like removing the static iSCSI targets is the step that breaks the whole thing:

The last few steps from the original screenshot:

  • datastore is renamed as: Renamed datastore from snap-13e16b15-iqn.2010-01 .com.solidfire:hi1k.centos53-x64.66-0 to -iqn.2010-01 .com.solidfire:hl1k.root-55.67-0 (fine)
  • VMFs is unmounted (per "plan" or not?)
  • static iSCSI target is removed - BUT without first deleting the datastore

Since datastore is NOT deleted, but its static iSCSI target is removed - vCenter will complain that the iSCSI path is no longer available for that datastore.

If I manually add static iSCSI target for the "iqn.2010-01 .com.solidfire:hl1k.root-55.67-0" - later you can mount (exisiting) datastore as expected, etc.

Hope this helps further troubleshooting and fixing the issue @skattoju4 /CC @mike-tutkowski

andrijapanicsb avatar Sep 23 '19 10:09 andrijapanicsb

I am not sure what the severity is for this @mike-tutkowski @skattoju4 @andrijapanicsb . Is this something we are going to support or should we just add a line in docs somewhere? @mike-tutkowski is there a dev you know busy with this plugin?

DaanHoogland avatar Dec 04 '20 11:12 DaanHoogland

I think this was the issue related to the one opened by Christian from the Fraunhofer institute. He sent us (CloudOps) some logs and we were going to troubleshoot/debug in his environment, however I think priorities changed on their end so this effort was put on hold.

skattoju4 avatar Dec 04 '20 17:12 skattoju4

ping @skattoju cc @swill @syed @pdion891 - any update on this?

rohityadavcloud avatar Aug 09 '21 09:08 rohityadavcloud

(moved this back to unplanned unless we hear any devs picking this up)

rohityadavcloud avatar Aug 09 '21 09:08 rohityadavcloud

Hello everyone,

I am from Netapp support team. There is no active development going on for ACS-SF plugin from Netapp but since this seems to be dangling case for quite sometime, thought would drop some input input based on what makes sense to me.

" iSCSI targets " once remove followed by a storage adaptor scan will ideally remove all or any traces of the unsignatured disks previously available through the iscsi server/target ... If it's not removed from the vSphere inventory then this is to be checked from the vSphere and ESXI logs.

My recommendation would be to trace the same from vpxd log ( vsphere log) , vpxa ( esxi log ) , hostd.log ( exsi log) and vmkernel .log (esxi log)

ACS - SF plugin throwing the error as message: .. Datastore '-iqn.2010-01.com.solidfire:hl1k.root-32.29-0' is not accessible. No connected and accessible host is attached to this datastore ..

makes sense because the hosts doesnot have any clue about the target, now the iscsi target(s) is(are) removed.

By any chance do we have the vsphere + esxi log bundle available ???

Regards, KC

ikchakraborty avatar Jun 02 '22 11:06 ikchakraborty

@ikchakraborty I think we have not got any infrastructure anywhere to support this. cc @andrijapanicsb @rohityadavcloud

DaanHoogland avatar Jun 03 '22 08:06 DaanHoogland

cc @shwstppr @pdion891 any update on this, does this work now, maybe closed?

rohityadavcloud avatar Aug 17 '22 12:08 rohityadavcloud

Hi my name is Aarushi Soni . I want to contribute to this issue . Please guide me through this process.

aarushisoni avatar Sep 08 '22 08:09 aarushisoni

According to #6548, this issue does not exist in ACS 4.16.1 and vmware 6.7

this may be closed

weizhouapache avatar Jan 27 '23 08:01 weizhouapache