harvester icon indicating copy to clipboard operation
harvester copied to clipboard

[FEATURE] VM Import/Migration

Open rebeccazzzz opened this issue 2 years ago • 8 comments

Support seamless migration from other virtualization platforms (Openstack and VMware)

Including allow importing VMDK and other formats of images.

Ranked Priorities:

  • [ ] VMWare
  • [ ] OpenStack
  • [ ] Others (to be added)

rebeccazzzz avatar May 12 '22 21:05 rebeccazzzz

a script for export openstack volume to qcow2 image: https://gist.github.com/futuretea/bc377d125ee751b3cf88418210ecbe89

futuretea avatar May 16 '22 03:05 futuretea

Another possible alternative may be forklift: https://github.com/konveyor/forklift-operator/blob/main/docs/k8s.md

ibrokethecloud avatar Jul 11 '22 01:07 ibrokethecloud

Hi @ibrokethecloud should this function include UI related changes? If so, please help to add require/ui label. thanks

WuJun2016 avatar Aug 10 '22 13:08 WuJun2016

In my opinion being able to import from the UI would be a useful feature. Maybe a bit out of scope but using the VMWare API to import a VM directly from ESXi or vSphere would be amazing.

msnelling avatar Aug 11 '22 16:08 msnelling

Initial work is available here: https://github.com/harvester/vm-import-controller

ibrokethecloud avatar Aug 16 '22 00:08 ibrokethecloud

Pre Ready-For-Testing Checklist

  • [ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:

  • [ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:

  • [ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:

  • [ ] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)? The PR is at:

    • [ ] Does the PR include the explanation for the fix or the feature?

    • [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:

  • [ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:

  • [ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:

  • [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility? The compatibility issue is filed at:

Automation e2e test issue: harvester/tests#522

Would also like to see xenserver/xcp-ng migration support

rave-net avatar Sep 20 '22 23:09 rave-net

@ibrokethecloud

As I'm working to familiarize with the flow of this I happened to notice.

Setup

  • 1 Harvester v1.1.0-rc1 Node QEMU/KVM 12C 24GiB Mem
  • 1 OpenStack Instance (devstack) leveraging 16 CPU & 32 Gi mem, as a VM on a bare-metal R720 instance provisioned with cloudconfig , stable/yoga OpenStack's DevStack Branch

CRD files:

suse-workstation-team-harvester➜  testing_gaurav_vm_migration  ᐅ  cat openstack_vm_import.yaml 
apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: buntu-qcow-from-openstack
  namespace: default
spec: 
  virtualMachineName: "5ebb2b4c-eaeb-4307-9de5-b49dfa48080d" #Name or UUID for instance
  networkMapping:
  - sourceNetwork: "shared"
    destinationNetwork: "default/mgmt-vlan"
  sourceCluster: 
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1
suse-workstation-team-harvester➜  testing_gaurav_vm_migration  ᐅ  cat openstack_secret.yaml 
apiVersion: v1
kind: Secret
metadata: 
  name: devstack-credentials
  namespace: default
stringData:
  "username": "admin"
  "password": "secret"
  "project_name": "admin"
  "domain_name": "default"
  "ca_cert": "pem-encoded-ca-cert"
suse-workstation-team-harvester➜  testing_gaurav_vm_migration  ᐅ  cat openstack_object.yaml 
apiVersion: migration.harvesterhci.io/v1beta1
kind: OpenstackSource
metadata:
  name: devstack
  namespace: default
spec:
  endpoint: "http://192.168.1.200/identity"
  region: "RegionOne"
  credentials:
    name: devstack-credentials
    namespace: default

Virtual Machine That Got Built:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    harvesterhci.io/vmRunStrategy: RerunOnFailure
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1alpha3
    migaration.harvesterhci.io/virtualmachineimport: buntu-qcow-from-openstack-default
  creationTimestamp: "2022-09-28T00:56:50Z"
  finalizers:
  - wrangler.cattle.io/VMController.UnsetOwnerOfPVCs
  generation: 1
  managedFields:
  - apiVersion: kubevirt.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:migaration.harvesterhci.io/virtualmachineimport: {}
      f:spec:
        .: {}
        f:runStrategy: {}
        f:template:
          .: {}
          f:metadata:
            .: {}
            f:creationTimestamp: {}
            f:labels:
              .: {}
              f:harvesterhci.io/vmName: {}
          f:spec:
            .: {}
            f:domain:
              .: {}
              f:cpu:
                .: {}
                f:cores: {}
                f:sockets: {}
                f:threads: {}
              f:devices:
                .: {}
                f:interfaces: {}
              f:memory:
                .: {}
                f:guest: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
            f:networks: {}
    manager: vm-import-controller
    operation: Update
    time: "2022-09-28T00:56:50Z"
  - apiVersion: kubevirt.io/v1alpha3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubevirt.io/latest-observed-api-version: {}
          f:kubevirt.io/storage-observed-api-version: {}
    manager: Go-http-client
    operation: Update
    time: "2022-09-28T00:56:51Z"
  - apiVersion: kubevirt.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:harvesterhci.io/vmRunStrategy: {}
        f:finalizers:
          .: {}
          v:"wrangler.cattle.io/VMController.UnsetOwnerOfPVCs": {}
    manager: harvester
    operation: Update
    time: "2022-09-28T00:56:51Z"
  - apiVersion: kubevirt.io/v1alpha3
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:created: {}
        f:printableStatus: {}
        f:ready: {}
    manager: Go-http-client
    operation: Update
    subresource: status
    time: "2022-09-28T01:01:25Z"
  name: 5ebb2b4c-eaeb-4307-9de5-b49dfa48080d
  namespace: default
  resourceVersion: "73318"
  uid: 68c753af-876e-4234-a2ce-dab6cfc55dbf
spec:
  runStrategy: RerunOnFailure
  template:
    metadata:
      creationTimestamp: null
      labels:
        harvesterhci.io/vmName: 5ebb2b4c-eaeb-4307-9de5-b49dfa48080d
    spec:
      domain:
        cpu:
          cores: 2
          sockets: 1
          threads: 1
        devices:
          interfaces:
          - bridge: {}
            macAddress: fa:16:3e:fe:0f:96
            model: virtio
            name: migrated-0
        machine:
          type: q35
        memory:
          guest: "3991142400"
        resources:
          limits:
            cpu: "2"
            memory: 4096M
          requests:
            cpu: 125m
            memory: "2730491904"
      networks:
      - multus:
          networkName: default/mgmt-vlan
        name: migrated-0
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-09-28T01:01:22Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    status: "True"
    type: LiveMigratable
  created: true
  printableStatus: Running
  ready: true

Initial Log Chunk From harvester-system/harvester-harvester-vm-import-controller-uuid

 MEM:     55%
Logs(harvester-system/harvester-harvester-vm-import-controller-85dbbf7649-mx2bw:harvester-vm-import-controller)[1m]                                                                 
││ time="2022-09-28T00:56:02Z" level=info msg="Applying CRD vmwaresources.migration.harvesterhci.io"                                                                                
││ time="2022-09-28T00:56:03Z" level=info msg="Applying CRD openstacksources.migration.harvesterhci.io"                                                                             
││ time="2022-09-28T00:56:03Z" level=info msg="Applying CRD virtualmachineimports.migration.harvesterhci.io"                                                                        
││ time="2022-09-28T00:56:04Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VmwareSource controller"                                                            
││ time="2022-09-28T00:56:04Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=OpenstackSource controller"                                                         
││ time="2022-09-28T00:56:04Z" level=info msg="reconcilling openstack soure :default/devstack"                                                                                      
││ time="2022-09-28T00:56:04Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VirtualMachineImport controller"                                                    
││ time="2022-09-28T00:56:04Z" level=info msg="Starting harvesterhci.io/v1beta1, Kind=VirtualMachineImage controller"                                                               
││ Stream canceled stream error: stream ID 95; INTERNAL_ERROR for harvester-system/harvester-harvester-vm-import-controller-85dbbf7649-mx2bw (harvester-vm-import-controller)                                                                                                                                       ││                                

Log From harvester-system/harvester-harvester-vm-import-controller-uuid after killing the pod and letting the pod restart:

│ time="2022-09-28T01:06:49Z" level=info msg="Applying CRD vmwaresources.migration.harvesterhci.io"                               │
│ time="2022-09-28T01:06:49Z" level=info msg="Applying CRD openstacksources.migration.harvesterhci.io"                            │
│ time="2022-09-28T01:06:50Z" level=info msg="Applying CRD virtualmachineimports.migration.harvesterhci.io"                       │
│ time="2022-09-28T01:06:51Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VmwareSource controller"           │
│ time="2022-09-28T01:06:51Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=OpenstackSource controller"        │
│ time="2022-09-28T01:06:51Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VirtualMachineImport controller"   │
│ time="2022-09-28T01:06:51Z" level=info msg="Starting harvesterhci.io/v1beta1, Kind=VirtualMachineImage controller"              │
│ time="2022-09-28T01:06:51Z" level=info msg="reconcilling openstack soure :default/devstack"                                     │
│ time="2022-09-28T01:06:51Z" level=info msg="vm buntu-qcow-from-openstack in namespace default imported successfully"            │
│ Stream canceled stream error: stream ID 57; INTERNAL_ERROR for harvester-system/harvester-harvester-vm-import-controller-85dbb

Screenshot from 2022-09-27 18-00-18 supportbundle_16cbdc41-5c44-4663-a9ca-3e99280a8c7a_2022-09-28T01-02-07Z.zip

From launching the web vnc console it looks like the Boot of the VM failed, as it couldn't read from disk?

I don't see any pvc's built for that VM:

       CAPACITY   ACCESS MODES   STORAGECLASS         AGE    VOLUMEMODE
cattle-monitoring-system   alertmanager-rancher-monitoring-alertmanager-db-alertmanager-rancher-monitoring-alertmanager-0   Bound    pvc-db8dad1f-cf52-4eb0-8aff-2f39c707b911   5Gi        RWO            harvester-longhorn   103m   Filesystem
cattle-monitoring-system   prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0           Bound    pvc-8deb0c5f-d9df-4436-9b67-966a21baa431   50Gi       RWO            harvester-longhorn   103m   Filesystem
cattle-monitoring-system   rancher-monitoring-grafana                                                                       Bound    pvc-c6f437e3-d585-44e5-b24f-2db33fb19814   2Gi        RWO            harvester-longhorn   104m   Filesystem

Screenshot from 2022-09-27 18-13-21 Screenshot from 2022-09-27 18-13-03

irishgordo avatar Sep 28 '22 01:09 irishgordo

@ibrokethecloud I noticed that the images fail to be acquired when crafting the vm_import yaml:

apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: openstack-demo-from-fremont
  namespace: default
spec: 
  virtualMachineName: "6d62d6c7-5f09-4e01-9ec9-c64d175f50c6" #Name or UUID for instance
  networkMapping:
  - sourceNetwork: "shared"
    destinationNetwork: "default/mgmt-vlan"
  sourceCluster: 
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1

It fires - on the vm-import pod I see:

│ 2022-09-29T03:29:46.592949674Z time="2022-09-29T03:29:46Z" level=info msg="&{1031a54c-9d45-4959-b053-6c4e8a5ec398 creating 1 nova 2022-09-29 03:29:46 +0000 UTC 0001-01-01 00:00:00 +0000 UTC []   lvmdriver-1 4c9645a8-3f3b-4f62-a2a1-4705493bdc81  map[] b7b378ba5eba4fffb5e057a5d58e2914 true false   false}"     │
│ 2022-09-29T03:29:56.883319245Z time="2022-09-29T03:29:56Z" level=info msg="attempting to create new image from volume"                                                                                                                                                                                               │
│ 2022-09-29T03:30:20.888931618Z time="2022-09-29T03:30:20Z" level=info msg="&{fcda5dc9-cd1b-4ea0-b673-3dacbf9d274e creating 1 nova 2022-09-29 03:30:21 +0000 UTC 0001-01-01 00:00:00 +0000 UTC []   lvmdriver-1 0a818c3a-d06b-4d89-87df-ebb33856964b  map[] b7b378ba5eba4fffb5e057a5d58e2914 false false   false}"    │
│ 2022-09-29T03:30:31.262587506Z time="2022-09-29T03:30:31Z" level=info msg="attempting to create new image from volume"                                                                                                                                                                                               │
│ 2022-09-29T03:31:34.64670192Z  Stream canceled stream error: stream ID 55; INTERNAL_ERROR for harvester-system/harvester-harvester-vm-import-controller-85dbbf7649-tgdsl (harvester-vm-import-controller)                                                                                                            │
│

Correlating to: Screenshot from 2022-09-28 20-38-19

I'm attaching the support bundle here: supportbundle_dc8579c0-217b-4129-b07a-d6b879e0c9b8_2022-09-29T03-32-59Z.zip

This was on Harvester v1.1.0-rc1, 12C 24GB Memory QEMU/KVM instance

irishgordo avatar Sep 29 '22 03:09 irishgordo

@irishgordo the issue is the chart name has changed as part of the subchart packaging with harvester.

Are you please able to update the deployment for harvester-harvester-vm-import-controller to have an env variable

SVC=harvester-harvester-vm-import-controller.harvester-system.svc

I will change the chart to handle this as the chart packaging is going to change in rc2.

ibrokethecloud avatar Sep 29 '22 05:09 ibrokethecloud

@ibrokethecloud thanks for mentioning that change :smile: I went ahead added that as an environment variable (via kubectl --kubeconfig local_laptop_vm.yaml edit deployment harvester-harvester-vm-import-controller -n harvester-system) on the deployment. Still just using v1.1.0-rc1 I watched it restart the deployment, scale down and then back up.

But it seems that it just has an error trying to resolve the image when attempting to build a volume?

supportbundle_dc8579c0-217b-4129-b07a-d6b879e0c9b8_2022-09-29T23-41-48Z.zip

Screenshot from 2022-09-29 16-42-13

│ 2022-09-29T23:38:13.548880419Z time="2022-09-29T23:38:13Z" level=info msg="Applying CRD vmwaresources.migration.harvesterhci.io"                                                                                                                            │
│ 2022-09-29T23:38:13.740747291Z time="2022-09-29T23:38:13Z" level=info msg="Applying CRD openstacksources.migration.harvesterhci.io"                                                                                                                         │
│ 2022-09-29T23:38:13.931057316Z time="2022-09-29T23:38:13Z" level=info msg="Applying CRD virtualmachineimports.migration.harvesterhci.io"                                                                                                                    │
│ 2022-09-29T23:38:14.818162984Z E0929 23:38:14.818053       1 memcache.go:196] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1                                                           │
│ 2022-09-29T23:38:14.819747108Z time="2022-09-29T23:38:14Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VirtualMachineImport controller"                                                                                                │
│ 2022-09-29T23:38:14.819822577Z time="2022-09-29T23:38:14Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VmwareSource controller"                                                                                                        │
│ 2022-09-29T23:38:14.819920703Z time="2022-09-29T23:38:14Z" level=info msg="Starting harvesterhci.io/v1beta1, Kind=VirtualMachineImage controller"                                                                                                           │
│ 2022-09-29T23:38:14.820057872Z time="2022-09-29T23:38:14Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=OpenstackSource controller"                                                                                                     │
│ 2022-09-29T23:38:14.820074709Z time="2022-09-29T23:38:14Z" level=info msg="reconcilling openstack soure :default/devstack"                                                                                                                                  │
│ 2022-09-29T23:38:14.822353121Z time="2022-09-29T23:38:14Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:14.891605119Z time="2022-09-29T23:38:14Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:15.093494129Z time="2022-09-29T23:38:15Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:15.292830978Z time="2022-09-29T23:38:15Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:15.493018895Z time="2022-09-29T23:38:15Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:15.702264105Z time="2022-09-29T23:38:15Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:15.893493823Z time="2022-09-29T23:38:15Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:16.215257809Z time="2022-09-29T23:38:16Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:16.858376602Z time="2022-09-29T23:38:16Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:18.143777343Z time="2022-09-29T23:38:18Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:20.706665743Z time="2022-09-29T23:38:20Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:25.829063681Z time="2022-09-29T23:38:25Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:36.073837478Z time="2022-09-29T23:38:36Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:38:56.926524189Z time="2022-09-29T23:38:56Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │
│ 2022-09-29T23:39:09.067093048Z time="2022-09-29T23:39:09Z" level=info msg="&{a26c67b3-f1b2-47f4-9c3e-080aad61c4ec creating 1 nova 2022-09-29 23:39:08 +0000 UTC 0001-01-01 00:00:00 +0000 UTC []   lvmdriver-1 5194be65-f557-4eb2-9d67-8dd449f78194  map[]  │
│ 2022-09-29T23:39:19.441995505Z time="2022-09-29T23:39:19Z" level=info msg="attempting to create new image from volume"                                                                                                                                      │
│ 2022-09-29T23:39:43.609202059Z time="2022-09-29T23:39:43Z" level=info msg="&{4a4fa93f-1195-4226-bb12-34f82a495c51 creating 1 nova 2022-09-29 23:39:43 +0000 UTC 0001-01-01 00:00:00 +0000 UTC []   lvmdriver-1 caa8e840-f631-43f4-8961-d92b2fd715ca  map[]  │
│ 2022-09-29T23:39:53.860760319Z time="2022-09-29T23:39:53Z" level=info msg="attempting to create new image from volume"                                                                                                                                      │
│ 2022-09-29T23:40:04.975323126Z time="2022-09-29T23:40:04Z" level=error msg="error syncing 'default/openstack-demo-againfremont': handler virtualmachine-import-job-change: error quering vmi in reconcileVMIStatus: virtualmachineimages.harvesterhci.io \" │

irishgordo avatar Sep 29 '22 23:09 irishgordo

Hi @irishgordo i suspect you need to delete and re-created the VirtualMachineImport object.

In the current rc, the vm-import-controller is using ephermeral storage.

With the move to addons soon we should be able to customize behavior and leverage pvc claims.

ibrokethecloud avatar Sep 30 '22 01:09 ibrokethecloud

Hey @ibrokethecloud - thats a great call out :smile: - I did manage to do that a few times and also changed the name as well when re-creating but was running into the similar issue.

Screenshot from 2022-09-29 19-21-10 supportbundle_dc8579c0-217b-4129-b07a-d6b879e0c9b8_2022-09-30T02-17-34Z.zip

Head "http://harvester-vm-import-controller.harvester-system.svc:8080/openstack-demo-1.img": dial tcp: lookup harvester-vm-import-controller.harvester-system.svc on 10.53.0.10:53: no such host

irishgordo avatar Sep 30 '22 02:09 irishgordo

@ibrokethecloud - after a bit more adjustments and some edits things started to click a bit more :smile:

I've been able to get instances from DC and even my local instance of OpenStack (which is zippy) pulling into Harvester via creating yaml files.

As you mentioned: https://github.com/harvester/harvester/issues/2274#issuecomment-1261763549

Upon digging into it, that actually needs to read:

...
  env:
  - name: SVC_ADDRESS
    value: harvester-harvester-vm-import-controller.harvester-system.svc
...

Once that was set, the connection was able to come across.

Initial Findings, via Base Testing

1. Question: Do we want to allow them to have spec.virtualMachineName be the "Name" of their OpenStack instance?

apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: k3os-node-two-vol
  namespace: default
spec: 
  virtualMachineName: "testAgainK3OSNodeTwoVols" #Name or UUID for instance
  networkMapping:
  - sourceNetwork: "public"
    destinationNetwork: "default/mgmt-vlan"
  sourceCluster: 
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1

N customers/users/orgs/ppl may have a non RFC 1123 naming strategy for their instances. Which then poses a problem, as the images will be built from N number of volumes, tied to their instance, yet the VM creation will fall into an error based state like:

error creating kubevirt VM in createVirtualMachine :[VirtualMachine.kubevirt.io](http://virtualmachine.kubevirt.io/) "testAgainK3OSNodeTwoVols\" is invalid: metadata.name: Invalid value: \"testAgainK3OSNodeTwoVols\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'exampl[e.com](http://e.com/)', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), requeuing" 

2022-09-30T23:38:09.628885055Z harvester-harvester-vm-import-controller-7cf49d6466-fd69l time="2022-09-30T23:38:09Z" level=error msg="error syncing 'default/k3os-node-two-vol': handlervirtualmachine-import-job-change: error creating kubevirt VM in createVirtualMachine :[VirtualMachine.kubevirt.io](http://virtualmachine.kubevirt.io/) \"testAgainK3OSNodeTwoVols\" is invalid: metadata.name: Invalid value: \"testAgainK3OSNodeTwoVols\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'exampl[e.com](http://e.com/)', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), requeuing"

2. Question: Do we want to allow the downloading of the image to hang if Harvester's Longhorn is over allocation

For more context, if we have Harvester that has a single disk, that is Allocated like 107% and Unschedulable though the used amount on the disk is much less, it seems that the image's that are trying to pull from OpenStack just hang. When I added a second disk to the Harvester VM, then created a Storage Class (node-tag: main, disk-tag: extra) and set as Default, removing the original default. The images came accross just fine upon creating the VirtualMachineImport CRD.

Cross-Ref: (hanging downloads): Screenshot from 2022-09-30 13-35-47 (over allocation on default disk that is also Unschedulable) Screenshot from 2022-09-30 14-06-37

Support Bundle for this qu^: supportbundle_dc8579c0-217b-4129-b07a-d6b879e0c9b8_2022-09-30T21-05-38Z.zip

3. Quetsion: Do we want to allow the creation of the CRD resource of VirtualMachineImport when the OpenStack VM has no storage associated with it?

In OpenStack we can create a VM that simply boots from a live image like:

openstack server create --image 2f7bca11-ab3f-452e-ac89-441efb715b6f --network cf9f7398-0c69-43b3-89fa-ad5ad8e2ccd9 --flavor 2 testK3osNoDiskGiven

(with that we'll be booting up a K3OS live image, tied to a network with a set flavor, but no disk/volume is associated with it - note image of the VM in openstack booting up with the live-image) X-Ref: Screenshot from 2022-09-30 17-01-29 Because we can build the CRD YAML, and import it, but it will not build a volume(s) since of course the VM in openstack had no volumes attached to it, but what we end up with is a VM in Harvester that can't really do anything... (Import successful, VM gets, built, no volumes - VNC shows no disk from VM) Screenshot from 2022-09-30 17-06-43

irishgordo avatar Oct 01 '22 00:10 irishgordo

I wasn't able to consistently reproduce this but I also noticed that cleaning up the dangling Volumes from the attempt of:

  1. Question: Do we want to allow them to have spec.virtualMachineName be the "Name" of their OpenStack instance?

It took a node restart to clean them up, it was strangely returning a 422 back: Screenshot from 2022-10-03 13-52-51 Screenshot from 2022-10-03 13-28-34

This was targeting the VM of:

apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: k3os-node-two-vol
  namespace: default
spec: 
  virtualMachineName: "testAgainK3OSNodeTwoVols" #Name or UUID for instance
  networkMapping:
  - sourceNetwork: "public"
    destinationNetwork: "default/mgmt-vlan"
  sourceCluster: 
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1

Which it's UUID instance from openstack is: 6db2da60-813b-4bb6-96ac-46c85c1bc522 - but it was able like mentioned earlier to bring over the volumes for the machine, but the VM fails to create due to RFC 1123 error.

Then cleaning up the dangling Volumes from those screenshots above, I was running into those issues. Workaround was a node restart and then I was able to delete them each. Again, I'm not sure if I can consistently reproduce this. supportbundle_287fe6a4-7974-4609-983a-4a61acd31a78_2022-10-03T20-27-48Z.zip

irishgordo avatar Oct 03 '22 21:10 irishgordo

@ibrokethecloud I had additional questions I wanted to ask, too.

1. Question: Should we forbid VirtualMachineImport's from creating if there isn't the destinationNetwork present within Harvester?

As in:

apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: test-vlan-dont-exist
  namespace: default
spec: 
  virtualMachineName: "6db2da60-813b-4bb6-96ac-46c85c1bc522" #Name or UUID for instance
  networkMapping:
  - sourceNetwork: "public"
    destinationNetwork: "default/mgmt-vlan-not-here"
  sourceCluster: 
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1

Whereas spec.networkMapping[0].destinationNetwork doesn't exist within the Harvester cluster. As the Volumes & Images will be built for N number of drives in OpenStack. The VM however will just be stuck in a Stopping state.

2. Question: Should we be building multiple NIC entries for OpenStack VMs that only have a Single NIC interface? Screenshot from 2022-10-03 16-56-59 Screenshot from 2022-10-03 16-51-27 Screenshot from 2022-10-03 16-44-58

You can reference the OpenStack version only has one interface - whereas the Harvester imported VM from OpenStack seemingly has two interfaces (same mac address) support-bundle:

supportbundle_287fe6a4-7974-4609-983a-4a61acd31a78_2022-10-04T00-01-41Z.zip

irishgordo avatar Oct 04 '22 00:10 irishgordo

@ibrokethecloud - I did also open this up, I'm not sure if it's entirely related to how I was building OpenStack resources for VM Imports: https://github.com/harvester/harvester/issues/2864

irishgordo avatar Oct 04 '22 00:10 irishgordo

@irishgordo thanks for giving it a thorough run.. I will try and answer all questions..

https://github.com/harvester/harvester/issues/2274#issuecomment-1264146111

I will add checks to ensure we convert, the VM name is made rfc 1123 compliant.

I am not sure if i want to check for over-committed storage allocation, as ideally the cluster operator would need to ensure that enough storage is available to perform the same.

I was not aware of the ability to create VM's in openstack without a disk. I can add logic to fail the import when the VM has no disks associated with it.

ibrokethecloud avatar Oct 04 '22 03:10 ibrokethecloud

@irishgordo with regards to: https://github.com/harvester/harvester/issues/2274#issuecomment-1266052747

There is a check which will clean up object ownership once the VM is in a Running state. Deleting the VirtualMachineImport object before the VM is running should automatically trigger the cleanup. This includes import failure scenarios too.

ibrokethecloud avatar Oct 04 '22 03:10 ibrokethecloud

@irishgordo for https://github.com/harvester/harvester/issues/2274#issuecomment-1266220335

If the source to destination is invalid due to any reason, not limited to incorrect network definitions, or network not present, then the interface will by default be mapped to the Management Network.

ibrokethecloud avatar Oct 04 '22 03:10 ibrokethecloud

@irishgordo for https://github.com/harvester/harvester/issues/2274#issuecomment-1266220335, I need to check the api response for the VM's. I suspect this is happening because openstack is creating two interfaces one for internal network and one for public consumption. But I cant be sure until i see the api response.

ibrokethecloud avatar Oct 04 '22 03:10 ibrokethecloud

@ibrokethecloud thanks for responding to all of those questions :smile: !

Working through this with vSphere initally I have noticed some operation failures in Tasks and opened this issue: https://github.com/harvester/harvester/issues/2879

irishgordo avatar Oct 05 '22 19:10 irishgordo

Also working through more vSphere testing noticing this issue as well: https://github.com/harvester/harvester/issues/2880

irishgordo avatar Oct 05 '22 22:10 irishgordo

And this seems to be a smaller issue as I don't think it's really effecting the functionality of the import: https://github.com/harvester/harvester/issues/2881

irishgordo avatar Oct 05 '22 22:10 irishgordo

@ibrokethecloud 1. Question: Should VM Import respect the default StorageClass used? - as in, while importing a VM that VM's disk - images - volumes, they should all be using the default StorageClass selected, correct?

irishgordo avatar Oct 06 '22 17:10 irishgordo

@ibrokethecloud for,

"Question 2" from this comment the other day: https://github.com/harvester/harvester/issues/2274#issuecomment-1266220335

I opened: https://github.com/harvester/harvester/issues/2890#issue-1400437398

irishgordo avatar Oct 07 '22 00:10 irishgordo

I'm not 100% if it's easy to reproduce but I've seen a few mixed things with importing a large OpenStack VM:

level=error msg="error syncing 'default/massive-disk-filled-vm': handler virtualmachine-import-job-change: error exporting virtual machine: error converting qcow2 to raw file: exit status 1, requeuing"

I'm thinking some of it may have to do with my environment more than anything.

When in OpenStack, the disk, may potentially be stuck in an uploading state the vm-import-controller will requeue the request and try to leverage building another volume, tho the original may change from "Uploading" to "Available" slightly later.

Screenshot from 2022-10-07 14-40-48

Question: is there a way to through editing the addon, add like a "wait period" before trying to requeue again?

I guess I'm just curious, because a large disk that has 100G written to it Screenshot from 2022-10-07 13-18-58

May end up taking a bit longer maybe? I'm again not too sure if it's my enviornment or not in some cases. I do know that I've been able to build an image on a VM that was using a 50G disk, yet on that VM it didn't have the disk filled to the brim with data, so it's exported image, I'm assuming would be smaller, from thin provisioning. Vs, having a disk that's 120G, that has 100G used on that disk.

irishgordo avatar Oct 07 '22 22:10 irishgordo

@ibrokethecloud working through a few tests, I do notice that if we try to remove and rebuild a VirtualMachineImport in fairly quick order, there seems to be a bit of a doubling up of the image download that happens for a VM that may have just a single disk:

https://github.com/harvester/harvester/issues/2901

irishgordo avatar Oct 10 '22 23:10 irishgordo

@ibrokethecloud regarding the testing, I've validated the following scenarios with v1.1.0-rc2:

  1. VM Import Can Happen Successfully For Single Node w/ OpenStack
  2. VM Import Can Happen Successfully For Single Node w/ vSphere
  3. VM Import Can Happen Successfully For 3 Node Harvester Cluster w/ OpenStack
  4. VM Import Can Happen Successfully For 3 Node Harvester Cluster w/ vSphere
  5. VM Import Can Happen w/ Multiple VMs defined in yaml from OpenStack
  6. VM Import Can Happen w/ Multiple VMs defined in yaml from vSphere
  7. VM Import when spec.networkMapping[0].sourceNetwork is defined as nonexistent bogus network, defaults to pod-network of type masqeurade on management Network w/ virtio in OpenStack
  8. VM Import when spec.networkMapping[0].sourceNetwork is defined as nonexistent bogus network, defaults to pod-network of type masqeurade on management Network w/ virtio in vSphere
  9. When you try to import a VM that has the spec.virtualMachineName that is camel-cased, within the logs, we see: level=info msg="vm focal-bad-name in namespace default has an invalid spec" && level=error msg="vm migration target testFocalCamelCase in VM focal-bad-name in namespace default is not RFC 11 23 compliant"
  10. Test VM Import on OpenStack by UUID & Name
  11. Test Canceling out a VM Import before Image(s) download finish by deleting the yaml crd, removes Images in OpenStack
  12. Test Canceling out a VM Import before Image(s) download finish by deleting the yaml crd, removes Images in vSphere

Are there any other large cases that you feel should be covered? ( tests loosely doc'd in working draft pr to your open pr to tests here: https://github.com/ibrokethecloud/tests/pull/2 )

irishgordo avatar Oct 11 '22 02:10 irishgordo