terraform-provider-vsphere icon indicating copy to clipboard operation
terraform-provider-vsphere copied to clipboard

feat: Add support for SR-IOV Network Adapters

Open sunnycarter opened this issue 3 years ago • 9 comments

Description

This pull request adds SR-IOV network_interface function to terraform, so that you can add SR-IOV Nics to a VM.

SR-IOV, Single Root I/O Virtualization, allows a single NIC (termed the Physical Function - PF) to be shared between a bounded number of VMs, providing a Virtual Function (VF) network interface to each VM. It is used for high-performance throughput on the network interface.

An SR-IOV network interface subresource would be declared in terraform main.tf as follows:

resource "vsphere_virtual_machine" "vm" {
  # ... other configuration ...
  host_system_id      = data.vsphere_host.host.id
  memory              = var.memory
  memory_reservation  = var.memory
  network_interface  {
    network_id        = data.vsphere_network.network.id
    adapter_type      = sriov
    physical_function = "0000:3b:00.1" 
  }
  ... other network_interfaces... 
}

If you set the adapter_type to sriov, you must also do the following:

  • ensure the host_system_id is set
  • ensure that memory_reservation == memory
  • set the physical_function on the interface to a SRIOV enabled physical adapter address
  • Additionally, bandwidth settings are not relevant to SRIOV nics.

There are limitations to the use of SRIOV NICS, due to fiddlyness with their PCI Bus unit numbers. (Non-SRIOV interfaces use unit numbers 7-17, whereas SRIOV interfaces use unit numbers 45-36 decreasing.) As much of the code for network adapters uses unit numbers as akin to index number of the interface in main.tf, this has made the changes fiddly. The limitations created here mean:

  • All SRIOV Nics must be declared after all non-SRIOV Nics.
  • You cannot add or delete non-SRIOV nics and update, if you have any SRIOV nics in the config.

Acceptance tests

  • [ ] Have you added an acceptance test for the functionality being added? NO
  • [ ] Have you run the acceptance tests on this branch? NO

I have not run acceptance tests as my vsphere environment does not have access to any NAS storage, so I cannot set environment variable TF_VAR_VSPHERE_NAS_HOST or TF_VAR_VSPHERE_NFS_PATH, so tests are failing with ``` make testacc TESTARGS="-run=TestAccResourceVSphereVirtualMachine_basic" Error: error mounting datastore: host "<TF_VAR_VSPHERE_ESXI2>": ServerFaultCode: A specified parameter was not correct:

      with vsphere_nas_datastore.ds1,
      on terraform_plugin_test.tf line 29, in resource "vsphere_nas_datastore" "ds1":
      29: resource "vsphere_nas_datastore" "ds1" {```

. I'd appreciate guidance on how to get around this, and which acceptance tests to run if it is possible. I've made notes on manual testing below.

Manual testing

The following manual tests have been done: Error cases:

  • Missing host
  • Invalid Physical function
  • Memory not reserved
  • SRIOV not enabled on physical function
  • Simple - change adapter type to sriov from vmxnet
  • Simple - change adapter type to vmxnet to sriov
  • Missing physical function on sriov nic
  • Physical function on vmxnet nic
  • Wrong host

Apply new:

  • 2x VMXnet then 2xSRIOV
  • Vmxnet, SRIOV, vmxnet, SRIOV
  • 10 VMXnet, 10 SRIOV
  • 10 VMXnet, 11 SRIOV
  • 2 SRIOV
  • 2xVM pair with 2VMxnet 2SRIOV
  • Bandwidth limit on vmxnet
  • e1000 network
  • 3 non-SRIOV

Update (new interfaces):

  • Additional SRIOV new resource indexes
  • Additional VMXnet new resource indexes no SRIOV
  • 2VMXnet, 2 SRIOV add new VMXnet before SRIOV
  • Switch physical adapter on two SRIOV NICs
  • Vmxnet bandwidth limit

Delete interfaces:

  • Delete SRIOV NICs from end of list
  • Delete SRIOV NICs within list
  • Delete Vmxnet NIC at end of VMxnet list with no SRIOV
  • Delete Vmxnet within list containing SRIOV after
  • Delete vmxnet within list containing only Vmxnet

Import:

  • 2x VMXnet then 2xSRIOV
  • 6 VMXnet, 6 SRIOV
  • e1000
  • Bandwidth limit non default

Bandwidth stuff:

  • One vmxnet without, one with non-defaults, one sriov, apply
  • same as above but import then plan
  • Add new SRIOV NIC - Update
  • Update vmxnet bandwidth setting
  • Import after the above
  • Bandwidth setting on SRIOV nic

Non-interface tests:

  • Update some other non NIC function
  • Import without SRIOV

Release Note

Release note for CHANGELOG:

FEATURES:

resource/vm/network_interface: Add adapter_type sriov which requires the network_interface to have a physical_function declared.  This new network_interface adapter_type allows vsphere_virtual_machine to be created with SRIOV Network Adapters. See PR for more details. (@@@TBD PR number)

References

https://github.com/vmware/govmomi/issues/2060 Resolves #1136

sunnycarter avatar May 25 '21 14:05 sunnycarter

CLA assistant check
All committers have signed the CLA.

hashicorp-cla avatar May 25 '21 14:05 hashicorp-cla

This fixes https://github.com/hashicorp/terraform-provider-vsphere/issues/1136

andrew-lee-1089 avatar May 25 '21 14:05 andrew-lee-1089

Any update on this @koikonom and other Authorized users /approvers? What's the process for getting this reviewed and merged?

andrew-lee-1089 avatar Jun 25 '21 09:06 andrew-lee-1089

Any update on this? is this ever going to be reviewed / merged?

andrew-lee-1089 avatar Sep 21 '21 09:09 andrew-lee-1089

Checking for an update, I need this for an environment as well.

RichBrumpton avatar Oct 25 '21 18:10 RichBrumpton

Hi there - is there any update on the review for this? What are the next steps please?

sunnycarter avatar Apr 01 '22 16:04 sunnycarter

Hi @sunnycarter. We apologize for the long delay. We are currently not at a place define timelines on larger features at this time. We will continue to gauge the engagement on this issue and do our best to prioritize it.

Thanks.

tejavar avatar Apr 01 '22 17:04 tejavar

@sunnycarter @andrew-lee-metaswitch - could you review and address the merge conflicts? If so, I’ll take a look at reviewing and onboarding this capability.

Meanwhile, I’ll mark this as draft.

Ryan Johnson Senior Staff Solutions Architect | Product Engineering @ VMware, Inc.

tenthirtyam avatar Sep 11 '22 21:09 tenthirtyam

Yes I will look into merge conflicts next week if possible. It's been a long time so I think it may take me a while to work through them. Thanks for saying you'll then look into it - we are very keen for this to get reviewed and merged in still.

sunnycarter avatar Sep 23 '22 16:09 sunnycarter

@tenthirtyam please can I ask for some guidance on sorting out these merge conflicts? When I wrote this Pull Request, the vmware/govmomi code was included within this terraform-provider-vsphere repo, and I had to make some changes to one file (which was vendor/github.com/vmware/govmomi/object/virtual_device_list.go, now deleted from this repo)

The vmware/govmomi code source no longer seems to be in this repo is now referenced to the source repo for govmomi, and the file that I want to change now resides here: https://github.com/vmware/govmomi/blob/master/object/virtual_device_list.go#L921 (I can't find the changelog that made this change)

I assume therefore I would need to submit a PullRequest on that repo, and get the version bumped, then pull in that new version of govmomi into this PullRequest before proceeding?

Am I correct, and do you know anything about the speed of that process?

sunnycarter avatar Sep 26 '22 14:09 sunnycarter

@tenthirtyam Ryan Johnson, I've made an attempt at dealing with the merge conflicts here, which included creating a Pull Request on govmomi : https://github.com/vmware/govmomi/pull/2957

My question above still stands as to whether this was the correct course of action, as I'm not quite sure what the history of this file was. The earlier commits to this terraform pull request show the previous change I made in this repo. Annoyingly, I can't remember if the change to that file was purely cosmetic, or was actually required for these SRIOV changes to work. I'm going with the assumption that it was required for now.

sunnycarter avatar Sep 26 '22 17:09 sunnycarter

Hi Sunny - I think you make the best steps forward here. The provider no longer requires the vendor directory and instead references this from the upstream.

I've cc: the maintainer in the upstream vmware/govmomi issue and have added an upstream/govmoni label to this PR since we'd need to await an upstream merge. Afterwhich, we'll open a separate PR to update vmware/govmomi for this provider.

Ryan Johnson Senior Staff Solutions Architect | Product Engineering @ VMware, Inc.

tenthirtyam avatar Sep 26 '22 17:09 tenthirtyam

Upstream PR merged.

tenthirtyam avatar Sep 30 '22 15:09 tenthirtyam

@tenthirtyam Hi Ryan, Now that the Upstream PR https://github.com/vmware/govmomi/pull/2957 merged what are the next steps to getting this PR merged please? Is there a way you (or I) can request a new release of govmomi so that it can be pulled into this PR and we can get it merged??

sunnycarter avatar Dec 12 '22 18:12 sunnycarter

Yes, we'd need an upstream release of vmware/govmomi and subsequent bump (and testing) here.

tenthirtyam avatar Dec 12 '22 18:12 tenthirtyam

Good news is that @akutz just released vmware/[email protected] today.

tenthirtyam avatar Dec 12 '22 19:12 tenthirtyam

Dependent on https://github.com/hashicorp/terraform-provider-vsphere/pull/1801.

tenthirtyam avatar Dec 13 '22 04:12 tenthirtyam

Thank you very much for this review. I've resolved all the requested changes apart from the one that was a question, where I've tried to explain my reasoning.

I'm handing back to you in case you wanted to review further. I know we are waiting for https://github.com/hashicorp/terraform-provider-vsphere/pull/1801 to be merged so that it can be pulled into this one before this can be merged?

sunnycarter avatar Dec 19 '22 18:12 sunnycarter

Thank you very much for this review. I've resolved all the requested changes apart from the one that was a question, where I've tried to explain my reasoning.

I'm handing back to you in case you wanted to review further. I know we are waiting for #1801 to be merged so that it can be pulled into this one before this can be merged?

Thanks, @sunnycarter - I'll take a look at testing this change right after the holidays. Appreciate the patience - hope to see this one pulled in very soon.

tenthirtyam avatar Dec 19 '22 19:12 tenthirtyam

Added to v2.3.0 queue for testing based on status of reviews to date - could push to v2.4.0.

tenthirtyam avatar Dec 19 '22 19:12 tenthirtyam

If possible would love to push for this to be added to v2.3.0 release as this is a feature my team and I could very much use

JoshuaFurman avatar Dec 22 '22 23:12 JoshuaFurman

If possible would love to push for this to be added to v2.3.0 release as this is a feature my team and I could very much use

Please use the recommended community practice of adding a 👍 reaction to the linked enhancement issue.

tenthirtyam avatar Dec 23 '22 05:12 tenthirtyam

@sunnycarter - I noticed that one of the test is failing:

==> Checking that code complies with gofmt requirements...
[5](https://github.com/hashicorp/terraform-provider-vsphere/actions/runs/3733914469/jobs/6336346585#step:6:6)
gofmt needs running on the following files:
[6](https://github.com/hashicorp/terraform-provider-vsphere/actions/runs/3733914469/jobs/6336346585#step:6:7)
./vsphere/internal/virtualdevice/virtual_machine_network_interface_subresource.go
[7](https://github.com/hashicorp/terraform-provider-vsphere/actions/runs/3733914469/jobs/6336346585#step:6:8)
You can use the command: `make fmt` to reformat code.
[8](https://github.com/hashicorp/terraform-provider-vsphere/actions/runs/3733914469/jobs/6336346585#step:6:9)
make: *** [GNUmakefile:46: fmtcheck] Error 1
[9](https://github.com/hashicorp/terraform-provider-vsphere/actions/runs/3733914469/jobs/6336346585#step:6:10)
Error: Process completed with exit code 2

tenthirtyam avatar Dec 23 '22 05:12 tenthirtyam

Sorry about that, I've run it now

sunnycarter avatar Jan 04 '23 16:01 sunnycarter

Can I inquire about the status of this PR? This feature would prove extremely useful for my team

JoshuaFurman avatar Feb 28 '23 16:02 JoshuaFurman

Can I inquire about the status of this PR? This feature would prove extremely useful for my team

Please refer to the linked milestone.

tenthirtyam avatar Feb 28 '23 16:02 tenthirtyam

Sorry about that, I've run it now

Hi @sunnycarter - there still appears to be a conflict on vsphere/internal/virtualdevice/virtual_machine_network_interface_subresource_test.go.

tenthirtyam avatar Feb 28 '23 16:02 tenthirtyam

Sorry about that, I've run it now

Hi @sunnycarter - there still appears to be a conflict on vsphere/internal/virtualdevice/virtual_machine_network_interface_subresource_test.go.

@tenthirtyam I've merged main in again now and resolved the conflict. Is there anything now stopping this being merged?

sunnycarter avatar Mar 10 '23 16:03 sunnycarter

Hello Folks!

We will make an effort to prioritize contributions, which we sincerely appreciate. The provider is long overdue for some technical debt work. This feature along with a few others, are innately harder to test. The current focus is to streamline and improve the testing and CI for this provider, this PR was a small step in that direction, there should be more PRs soon around this.

Once we can establish some CI workflow for the provider, we will make an effort to prioritize feature work. We appreciate your patience and contributions.

appilon avatar Mar 10 '23 17:03 appilon

Thank you for working towards this!

JoshuaFurman avatar Mar 10 '23 17:03 JoshuaFurman