cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

VM HA failing to restart on another primary storage

Open DaanHoogland opened this issue 3 weeks ago • 0 comments

Action: Enable HA on a user VM → prepare primary storage pool for maintenance. Expected Result: HA VM automatically restarts on the alternate storage pool, stable after cancel maintenance. Actual Result: HA VM remains in Stopped state - does NOT auto-restart on alternate pool.

Evidence

Baseline: Environment Setup

Zone, Cluster, and Hosts confirmed operational:

Primary Storage 1 and Primary Storage 2 are in Up state:

(localcloud) 🐱 > list storagepools filter=name,id,state
{
  "count": 2,
  "storagepool": [
    {
      "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Up"
    },
    {
      "id": "d2d00507-fb67-3329-a32d-775c637a0ba3",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}

Database clean - no stale records:

mysql> SELECT * FROM storage_pool_work;
Empty set (0.00 sec)

Test Setup: Created HA-enabled Service Offering and Deployed Test VMs

(localcloud) 🐱 > create serviceoffering name=Small-HA-Instance displaytext="Small Instance with HA" cpunumber=1 cpuspeed=1000 memory=512 offerha=true
{
  "serviceoffering": {
    "id": "e601bccc-9446-4c31-bae5-5f873bb6c712",
    "name": "Small-HA-Instance",
    "offerha": true
  }
}

Deployed two VMs - one with HA, one without:

(localcloud) 🐱 > list virtualmachines filter=name,id,instancename,state,hostname,haenable
{
  "count": 2,
  "virtualmachine": [
    {
      "haenable": false,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "id": "b65f682e-08de-4e8f-afce-004f4d4054f2",
      "instancename": "i-2-3-VM",
      "name": "test-vm-no-ha",
      "state": "Running"
    },
    {
      "haenable": true,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "id": "b7cffaf4-04c6-4c2a-9bd5-2c2ac7fe03df",
      "instancename": "i-2-5-VM",
      "name": "test-vm-ha",
      "state": "Running"
    }
  ]
}

Pre-Maintenance: VM Storage Location

HA VM (i-2-5-VM) hosted on Primary Storage 1 (ID: 1383bf9b-b82e-3566-9bca-769b4b69227d):

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-5-VM | grep "source file"
      <source file='/mnt/1383bf9b-b82e-3566-9bca-769b4b69227d/45e68b66-c706-43e9-82bb-5e421411d072' index='2'/>
        <source file='/mnt/1383bf9b-b82e-3566-9bca-769b4b69227d/a4658949-d4f9-11f0-b96d-1e001100042d'/>

Non-HA VM (i-2-3-VM) hosted on Primary Storage 2 (ID: d2d00507-fb67-3329-a32d-775c637a0ba3):

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-3-VM | grep "source file"
      <source file='/mnt/d2d00507-fb67-3329-a32d-775c637a0ba3/21b3676b-993d-4245-ad01-73de2afaba51' index='2'/>
        <source file='/mnt/d2d00507-fb67-3329-a32d-775c637a0ba3/a4658949-d4f9-11f0-b96d-1e001100042d'/>

Enabled Maintenance on Primary Storage 1

(localcloud) 🐱 > enableStorageMaintenance id=1383bf9b-b82e-3566-9bca-769b4b69227d
{
  "storagepool": {
    "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
    "jobid": "e2253689-fe35-4967-9651-a43016484f8a",
    "jobstatus": 0,
    "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
    "state": "Maintenance"
  }
}

Post-Maintenance Results

Storage Pool State:

(localcloud) 🐱 > list storagepools filter=name,id,state
{
  "count": 2,
  "storagepool": [
    {
      "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Maintenance"
    },
    {
      "id": "d2d00507-fb67-3329-a32d-775c637a0ba3",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}

VM States - HA VM did NOT auto-restart:

(localcloud) 🐱 > list virtualmachines filter=name,instancename,state,hostname,haenable
{
  "count": 2,
  "virtualmachine": [
    {
      "haenable": false,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "instancename": "i-2-3-VM",
      "name": "test-vm-no-ha",
      "state": "Running"
    },
    {
      "haenable": true,
      "instancename": "i-2-5-VM",
      "name": "test-vm-ha",
      "state": "Stopped"
    }
  ]
}

HA VM not running on hypervisor:

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh list --all
 Id   Name       State
--------------------------
 1    v-2-VM     running
 2    r-4-VM     running
 3    i-2-3-VM   running

Database Evidence:

mysql> SELECT * FROM storage_pool_work;
+----+---------+-------+-------------------------+---------------------------+----------------+
| id | pool_id | vm_id | stopped_for_maintenance | started_after_maintenance | mgmt_server_id |
+----+---------+-------+-------------------------+---------------------------+----------------+
|  1 |       1 |     1 |                       1 |                         1 | 32985634047021 |
|  2 |       1 |     5 |                       1 |                         0 | 32985634047021 |
+----+---------+-------+-------------------------+---------------------------+----------------+
2 rows in set (0.00 sec)

vm_id=5 (HA VM i-2-5-VM): stopped_for_maintenance=1, started_after_maintenance=0 ✗ Should be 1

Originally posted by @rosi-shapeblue in https://github.com/apache/cloudstack/issues/11789#issuecomment-3632820499

DaanHoogland avatar Dec 09 '25 15:12 DaanHoogland