gateway-api conformance: Conformance tests need a way to test that a percentage of requests succeed or fail

What would you like to be added: In #1243, I added some conformance testing around new BackendRef tests, but one case we couldn't catch is the following:

There are two backendRefs in the HTTPRoute backendRef list, with equal weights
One is invalid
Half of the requests sent through the implementation must get a 500.

Why this is needed: Part of the spec is weighted load balancing, having some tooling to allow us to test this as part of conformance will help a lot.

Downside: It's going to be tricky to write the tests to be not flaky. I suspect that we'll need to provide a target percentage value, add some wiggle room to it, and then send some number of requests. It seems likely to me that the details will matter a lot here.

May need to be a GEP, let's discuss here first.

Jul 05 '22 01:07 youngnick

This definitely feels like it would be flaky to test - would using filters to direct 100% of traffic to each of a valid and invalid backend for the same route be a viable alternative to this, or not sufficient?

Jul 05 '22 16:07 mikemorris

Well, in the spec for backendRef, the above behavior is mandated (that is, if there are multiple backends, and one is invalid, then the invalid one should instead produce an equivalent percentage of 500 errors), so ideally we should have a way to test implementations do that in conformance.

That's why I put a target percentage value, and some wiggle room (maybe 50% plus minus 5% or something for the two-backend case).

Having filters direct traffic doesn't meet the spec item that we're trying to test, unfortunately.

Jul 07 '22 05:07 youngnick

I agree that both of the things you're describing are necessary and related. We should probably track both of these here or create a separate issue for the test @mikemorris described. To ensure this isn't flaky we'll likely need a relatively large number of requests. This kind of test may require a "slow" label like upstream Kubernetes tests so we can skip it for faster test runs.

Jul 07 '22 06:07 robscott

I agree that we should create another issue for the test Mike described.

Jul 08 '22 03:07 youngnick

Well, looking at HTTPRouteFilter it appears the functionality I was envisioning doesn't really exist (so I'm not opening an issue to test it hah) - HTTPRequestMirrorFilter is somewhat similar but would ignore the actual configured backendRefs on the HTTPRoute, I was thinking of something like an HTTPRequestLabelSelectorFilter using https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ to filter backendRefs by metadata.

Jul 08 '22 19:07 mikemorris

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 06 '22 19:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 05 '22 19:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Dec 05 '22 20:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 05 '22 20:12 k8s-ci-robot

gateway-api gateway-api copied to clipboard

conformance: Conformance tests need a way to test that a percentage of requests succeed or fail

gateway-api
gateway-api copied to clipboard