cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[occm] Add support of trunk ports

Open jingczhang opened this issue 4 years ago • 86 comments

Currenlty, getAttachedInterfacesByID() only returns neutron ports directly attached to VM. For vlan-aware VMs that have trunk ports, subports attached to the trunk ports are not returned. This pull request checks if VM has trunk ports, if so, adds their subports in the return.

Signed-off-by: Jing Zhang [email protected]

NONE

jingczhang avatar May 17 '21 13:05 jingczhang

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar May 17 '21 13:05 k8s-ci-robot

Welcome @jingczhang!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar May 17 '21 13:05 k8s-ci-robot

Hi @jingczhang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 17 '21 13:05 k8s-ci-robot

Build succeeded.

theopenlab-ci[bot] avatar May 17 '21 13:05 theopenlab-ci[bot]

Build succeeded.

theopenlab-ci[bot] avatar May 17 '21 13:05 theopenlab-ci[bot]

Build failed.

theopenlab-ci[bot] avatar May 17 '21 13:05 theopenlab-ci[bot]

Build succeeded.

theopenlab-ci[bot] avatar May 17 '21 16:05 theopenlab-ci[bot]

/ok-to-test

jichenjc avatar May 18 '21 01:05 jichenjc

Thanks for the contribution!

But first, I want to know more about the problems this PR is trying to fix, what issues have you seen? Why the k8s nodes are using truck port which should be supposed to use in NFV use case? Is there any particular reason?

lingxiankong avatar May 20 '21 02:05 lingxiankong

Hi,

I had tried to put more information for the PR in the commit message but messed up the format, so they did not show up.

Major reasons for this PR are:

(1) To support CNFs that tag their own vlans i.e. vlan-aware CNFs

There is work on-going at sriov-cni side to enable this: https://github.com/k8snetworkplumbingwg/sriov-cni/issues/133 Implement interface for vlan trunking

(2) To align Kubernetes deployment over VM to that over BM (bare-metal)

So that, from CNF point of view, same "look-and-feel". The same CNFs can be deployed in exactly the same way regardless the underlaying infra-structure i.e. virtualized or not.

By extension, enable the same SDN solution for Kubernetes deployments, either over VM or over BM.

Please check and review.

Jing

jingczhang avatar May 20 '21 13:05 jingczhang

Hi,

I had tried to put more information for the PR in the commit message but messed up the format, so they did not show up.

Major reasons for this PR are:

(1) To support CNFs that tag their own vlans i.e. vlan-aware CNFs

There is work on-going at sriov-cni side to enable this: k8snetworkplumbingwg/sriov-cni#133 Implement interface for vlan trunking

(2) To align Kubernetes deployment over VM to that over BM (bare-metal)

So that, from CNF point of view, same "look-and-feel". The same CNFs can be deployed in exactly the same way regardless the underlaying infra-structure i.e. virtualized or not.

By extension, enable the same SDN solution for Kubernetes deployments, either over VM or over BM.

Please check and review.

Jing

Thanks for that info, I totally understand how vlan-aware ports work and k8s deployment over VM and BM. What I was asking is about the real use case. Are you working with an environment that already has used vlan-aware port in k8s VM deployment and seen issues with openstack-cloud-controller-manager? Or you just want to contribute something that may be useful in the future?

Another thing I'm not quite sure about but is very important to openstack-cloud-controller-manager is, have you tested if the subport in neutron trunk can work with octavia?

lingxiankong avatar May 20 '21 20:05 lingxiankong

Hi,

This PR is to solve the real issue in deployment scenario where VMs are on flat networks. VMs are assigned with trunk ports, IPs are assigned on the subports. Without this PR, those VMs can not be initialized as Kubernetes nodes since cloud-manager-openstack does not see VMs' IP on subports.

This PR has no impact to VMs not assigned with trunk ports. Can you expand the concern with Octavia?

Jing

jingczhang avatar May 21 '21 12:05 jingczhang

Hi,

This PR is to solve the real issue in deployment scenario where VMs are on flat networks. VMs are assigned with trunk ports, IPs are assigned on the subports. Without this PR, those VMs can not be initialized as Kubernetes nodes since cloud-manager-openstack does not see VMs' IP on subports.

This PR has no impact to VMs not assigned with trunk ports. Can you expand the concern with Octavia?

Jing

Could you please show me the port information of k8s worker nodes in your deployment? Because I don't have such environment neither the CI, could you also test with this PR the service can be created successfully? I was mentioning Octavia because openstack-cloud-controller-manager is create Octavia load balancer with subnet info of the worker nodes, I'm wondering which subnet the load balancer is going to use and which subnet is used for adding load balancer members.

Sorry for so many questions but I just want to fully understand the use case as this is the first time I've heard that creating k8s cluster on top of vlan aware VMs.

lingxiankong avatar May 23 '21 10:05 lingxiankong

Hi,

(1) Below are the excerpts from logs without and with the PR:

=== without the PR === I0525 15:39:22.462829 1 node_controller.go:325] Initializing node mycluster-01-control-01 with cloud provider E0525 15:39:24.132876 1 node_controller.go:370] failed to initialize node mycluster-01-control-01 at cloudprovider: failed to find kubelet node IP from cloud provider

=== with the PR === I0525 18:29:06.924591 1 node_controller.go:390] Initializing node mycluster-01-control-01 with cloud provider I0525 18:29:09.859509 1 openstack.go:698] Claiming to support Zones I0525 18:29:10.517786 1 node_controller.go:454] Successfully initialized node mycluster-01-control-01 with cloud provider

(2) Below are the excerpts of nova instance attached ports (trunk) and sub-ports.

[stack@undercloud (overcloudrc) ~]$ nova interface-list mycluster-01-control-01 +------------+--------------------------------------+--------------------------------------+----------------------------------+-------------------+-----+ | Port State | Port ID | Net ID | IP addresses | MAC Addr | Tag | +------------+--------------------------------------+--------------------------------------+----------------------------------+-------------------+-----+ | ACTIVE | 46c7638a-bb37-4ee2-a9cb-be3f4b29f6a5 | 13dd900a-4cfa-4cff-8ad4-ead0e1bfe9ef | 13.13.13.206 | fa:16:3e:a2:1f:6b | - | ......

[stack@undercloud (overcloudrc) ~]$ openstack network trunk list +--------------------------------------+---------------------------------------------+--------------------------------------+-------------+ | ID | Name | Parent Port | Description | +--------------------------------------+---------------------------------------------+--------------------------------------+-------------+ | b1d0b4fa-dc26-4729-b1eb-c6f63304d911 | mycluster-01-control-01-infra_bond_trunk_1 | 46c7638a-bb37-4ee2-a9cb-be3f4b29f6a5 | | ......

[stack@undercloud (overcloudrc) ~]$ openstack network subport list --trunk mycluster-01-control-01-infra_bond_trunk_1 +--------------------------------------+-------------------+-----------------+ | Port | Segmentation Type | Segmentation ID | +--------------------------------------+-------------------+-----------------+ | e86489cb-f37e-424d-8e10-2c1f448e69f9 | vlan | 50 | | 2f60a3d1-ad2b-43ae-b9e7-cb64f582449a | vlan | 701 | +--------------------------------------+-------------------+-----------------+

[stack@undercloud (overcloudrc) ~]$ openstack port show 2f60a3d1-ad2b-43ae-b9e7-cb64f582449a | grep fixed_ip ... | fixed_ips | ip_address='10.10.10.50', subnet_id='b2f87457-af78-4dcd-b03d-aa0b49423535' ...

(3) With regard to your question "could you also test with this PR the service can be created successfully", can you pelase recommend a specific test case? I mean a specific one with steps, so as to make sure there is no confusion.

Jing

jingczhang avatar May 25 '21 18:05 jingczhang

Thanks for those info, please see my comments inline.

Below are the excerpts from logs without and with the PR:

=== without the PR === I0525 15:39:22.462829 1 node_controller.go:325] Initializing node mycluster-01-control-01 with cloud provider E0525 15:39:24.132876 1 node_controller.go:370] failed to initialize node mycluster-01-control-01 at cloudprovider: failed to find kubelet node IP from cloud provider

From the following console output, I see the trunk port has an IP 13.13.13.206, which should be returned in the function getAttachedInterfacesByID(), why the openstack-cloud-controller-manager failed to get node IP?

(3) With regard to your question "could you also test with this PR the service can be created successfully", can you pelase recommend a specific test case? I mean a specific one with steps, so as to make sure there is no confusion.

The main function of openstack-cloud-controller-manager is to provide Service of type LoadBalancer implementation, please make sure the Service can be created successfully with the change.

lingxiankong avatar May 25 '21 22:05 lingxiankong

Hi,

(1) It is because the nodeIp 10.10.10.50 is on the vlan interface (subport), not on the trunk port

(2) Just to be sure I follow you...I have assumed this PR is transparent to deployments not using trunk ports (by the nature of this PR itself)...while looking at the verification test runs now, I see it failed two tests (test-e2e-comformance-stable-brach-v1.19 and test-lb-octavia). You suggest we to verify Octavia...we will verify that...suppose that is done, this test-lb-octavia test would most likely pass?

Is there a way to check for what reason the above two tests fail?

Jing

jingczhang avatar May 26 '21 13:05 jingczhang

Hi,

(1) It is because the nodeIp 10.10.10.50 is on the vlan interface (subport), not on the trunk port

I know the subport also has an IP, I was asking why 13.13.13.206 was not returned by the existing getAttachedInterfacesByID() function here? Maybe I missed something, but even with your change, 13.13.13.206 is still visible to cloud controller manager.

(2) Just to be sure I follow you...I have assumed this PR is transparent to deployments not using trunk ports (by the nature of this PR itself)...while looking at the verification test runs now, I see it failed two tests (test-e2e-comformance-stable-brach-v1.19 and test-lb-octavia). You suggest we to verify Octavia...we will verify that...suppose that is done, this test-lb-octavia test would most likely pass?

Please ignore those 2 tests, test-e2e-comformance-stable-brach-v1.19 doesn't test Service functions, and test-lb-octavia has been disabled as the backend k8s cluster is still running an old version that can't work with master openstack-cloud-controller-manager. That's the reason we need to manually verify all the PRs related to openstack-cloud-controller-manager before the job is improved and added back.

lingxiankong avatar May 26 '21 21:05 lingxiankong

Hi,

(1) This PR is NOT to remove the IPs assigned on the ports (trunk port) directly attached to the instance, this is why I said this PR is transparent to the instances not using trunk ports. This PR is to additionally get the IPs assigned on the sub ports (vlan port). The nodeIP (10.10.10.50) is assigned on the sub ports, VM is vlan-aware, this nodeIP is just one of the IPs assigned on VM's vlan interfaces.

(2) Thanks for the clarification for the failed tests are irrelevant. I will update on Octavia test, we do use Octavia.

Jing

jingczhang avatar May 27 '21 12:05 jingczhang

Hi,

(1) This PR is NOT to remove the IPs assigned on the ports (trunk port) directly attached to the instance, this is why I said this PR is transparent to the instances not using trunk ports. This PR is to additionally get the IPs assigned on the sub ports (vlan port). The nodeIP (10.10.10.50) is assigned on the sub ports, VM is vlan-aware, this nodeIP is just one of the IPs assigned on VM's vlan interfaces.

Sorry, you didn't answer my question. In your log:

E0525 15:39:24.132876 1 node_controller.go:370] failed to initialize node mycluster-01-control-01 at cloudprovider: failed to find kubelet node IP from cloud provider

It said node_controller can't find node IP, but the trunk port does have an IP in your console output. I am trying to better understand the bug you want to fix here. Unfortunately, I don't think this PR could fix that bug.

lingxiankong avatar May 27 '21 22:05 lingxiankong

Hi,

Not sure what exactly is not clear to you. Let me try again by breaking down the details step-by-step: (1) VMs are vlan-aware. (2) VMs are attached with trunk ports, trunk ports are created on flat networks. IPs assigned on trunk ports are useless. (3) IPs on the vlan interfaces are assigned on the sub-ports (on VLAN networks). One of those IPs is the Kubernetes node IP. (4) Without this PR, cloud controller only returns IPs on the neutron ports directly attached to the VMs, trunk ports in this case. (5) With this PR, cloud controller additionally returns IPs on the sub-ports, one of those IPs is the Kubernetes node IP. (6) This PR HAS resolved the real life issue.

I hope it is clear this time.

Jing

jingczhang avatar May 28 '21 12:05 jingczhang

(4) Without this PR, cloud controller only returns IPs on the neutron ports directly attached to the VMs, trunk ports in this case.

This step is not clear to me. You said the cloud controller could find an IP (whether it's useless or not), but how do you explain the log failed to find kubelet node IP from cloud provider?

(5) With this PR, cloud controller additionally returns IPs on the sub-ports, one of those IPs is the Kubernetes node IP.

How cloud controller could choose the right subport's IP rather than trunk port's IP?

lingxiankong avatar May 29 '21 10:05 lingxiankong

Hi,

(1) Node IP is on on the sub-port, not on the trunk ports. (2) IPs on the trunk ports are filtered by internal-network-name.

We expect to verify Octivia w/ this networking scheme sometime in June, and will update.

Jing

jingczhang avatar May 31 '21 14:05 jingczhang

Build succeeded.

theopenlab-ci[bot] avatar Jun 14 '21 13:06 theopenlab-ci[bot]

Build succeeded.

theopenlab-ci[bot] avatar Jun 14 '21 13:06 theopenlab-ci[bot]

Build failed.

theopenlab-ci[bot] avatar Jun 14 '21 14:06 theopenlab-ci[bot]