ovn-kubernetes icon indicating copy to clipboard operation
ovn-kubernetes copied to clipboard

make svc_template_support configurable, default is enabled

Open cathy-zhou opened this issue 10 months ago • 6 comments

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

cathy-zhou avatar Apr 19 '24 19:04 cathy-zhou

Deploy Preview for subtle-torrone-bb0c84 ready!

Name Link
Latest commit da4a3741d8027b04c9b9a6624fd422004f2078d4
Latest deploy log https://app.netlify.com/sites/subtle-torrone-bb0c84/deploys/6622c4dc25c244000883e562
Deploy Preview https://deploy-preview-4293--subtle-torrone-bb0c84.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Apr 19 '24 19:04 netlify[bot]

Is there a reason why we don't want to use LB Templates?

tssurya avatar Apr 19 '24 19:04 tssurya

Coverage Status

Changes unknown when pulling da4a3741d8027b04c9b9a6624fd422004f2078d4 on cathy-zhou:svc_template_configurable into ** on ovn-org:master**.

coveralls avatar Apr 19 '24 19:04 coveralls

Is there a reason why we don't want to use LB Templates?

There are several issues we found downstream. @cathy-zhou can you please raise issues for:

  1. Performance regression while using service templates.
  2. If a K8s nodes have different set of IP addresses, then the template doesn't work as expected. The ovn-controller on nodes with lesser number of IPs will interpret the template variable as a 'raw string' and spews out lot of errors.

girishmg avatar Apr 22 '24 15:04 girishmg

Is there a reason why we don't want to use LB Templates?

There are several issues we found downstream. @cathy-zhou can you please raise issues for:

  1. Performance regression while using service templates.
  2. If a K8s nodes have different set of IP addresses, then the template doesn't work as expected. The ovn-controller on nodes with lesser number of IPs will interpret the template variable as a 'raw string' and spews out lot of errors.

For 2, I just filed https://github.com/ovn-org/ovn-kubernetes/issues/4301. We are still investigating the performance issue and will file one once we know more.

cathy-zhou avatar Apr 22 '24 16:04 cathy-zhou

@cathy-zhou can you please rebase?

girishmg avatar May 02 '24 16:05 girishmg

  1. Performance regression while using service templates. @hzhou8 do you have corresponding issue filed for the offloading performance issue when svc_template is enabled? if so, can you post the link to it?

cathy-zhou avatar May 09 '24 17:05 cathy-zhou

Please also add this to https://github.com/ovn-org/ovn-kubernetes/blob/master/helm/ovn-kubernetes/values.yaml so helm can benefit from it. @flavio-fernandes please take a look

cathy-zhou avatar May 09 '24 22:05 cathy-zhou

  1. Performance regression while using service templates. @hzhou8 do you have corresponding issue filed for the offloading performance issue when svc_template is enabled? if so, can you post the link to it?

I don't have an issue reported yet, but here are the details:

When LB template is enabled, a priority 120 flow is missing in the UNSNAT stage that skips the UNSNAT for LB VIPs that are also used as SNAT IP. The flow was added by: 832893bdbb42 ("ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline")

The LB template implementation in OVN didn't handle this properly, and the flow is missing if template is used. (It is actually difficult to handle with template, so this can be a limitation of LB template) - Because of this missing flow, traffic to node-port is not offloaded, and performance drops significantly. But this is only for our downstream environment. The full story is more complex:

The above commit 832893bdbb42 itself has a problem, and this priority-120 flow is again removed by a bug fix: ffe267317c25 ("northd: Don't skip the unSNAT stage for traffic towards VIPs.") This bug fix is also backported to 23.06 but not to our downstream branch yet.

So, enabling LB template resulted in HW offload problem in our downstream where the above bug fix doesn't exist, and disabling LB template restores the HW offload. However, for current upstream branches with the above fix ffe267317c25, it is very likely that the HW offload problem is present regardless of the LB template. This is yet to be verified. If it is verified, then we will need to provide a fix in OVN, and we are not sure yet if we can make LB template work with HW offload, which depends on the new fix in OVN.

hzhou8 avatar May 10 '24 01:05 hzhou8

Please also add this to https://github.com/ovn-org/ovn-kubernetes/blob/master/helm/ovn-kubernetes/values.yaml so helm can benefit from it. @flavio-fernandes please take a look

awesome. TYSM!

flavio-fernandes avatar May 10 '24 15:05 flavio-fernandes

LGTM

flavio-fernandes avatar May 10 '24 15:05 flavio-fernandes