contour icon indicating copy to clipboard operation
contour copied to clipboard

Enable ingress conformance tests

Open michmike opened this issue 6 years ago • 24 comments

Once Ingress v1 support is complete, we should enable the ingress conformance tests in CI

michmike avatar Dec 13 '19 00:12 michmike

Prerequisite for #1944

jpeach avatar Dec 13 '19 00:12 jpeach

Blocked by #403

jpeach avatar Jan 15 '20 01:01 jpeach

See: https://github.com/kubernetes-sigs/ingress-controller-conformance

And a separate effort: https://github.com/aledbf/ingress-conformance

jpeach avatar Jan 15 '20 21:01 jpeach

Ingress conformance is basically blocked. There's no agreement upstream on what the harness should look like, so verification checks can't be committed.

jpeach avatar Feb 19 '20 05:02 jpeach

Removing this from the prioritize backlog because it is blocked upstream.

jpeach avatar Jul 15 '20 01:07 jpeach

@jpeach i would prefer we add the blocked tag and leave it in the backlog. we still want to do it at the current priority or as soon as possible given the upstream block. can you please make the changes?

michmike avatar Jul 15 '20 02:07 michmike

@jpeach i would prefer we add the blocked tag and leave it in the backlog. we still want to do it at the current priority or as soon as possible given the upstream block. can you please make the changes?

I can move it back, but I don't see the benefit of having issues in the prioritized queue that can't be worked on.

jpeach avatar Jul 15 '20 02:07 jpeach

Does this still look blocked given the state of https://github.com/kubernetes-sigs/ingress-controller-conformance? (theres some example of running conformance with sonobuoy and kind etc. https://github.com/kubernetes-sigs/ingress-controller-conformance/blob/master/examples/kind/.github/workflows/main.yaml)

sunjayBhatia avatar Dec 04 '20 03:12 sunjayBhatia

Does this still look blocked given the state of https://github.com/kubernetes-sigs/ingress-controller-conformance? (theres some example of running conformance with sonobuoy and kind etc. https://github.com/kubernetes-sigs/ingress-controller-conformance/blob/master/examples/kind/.github/workflows/main.yaml)

See https://github.com/projectcontour/contour/issues/1944

jpeach avatar Dec 04 '20 03:12 jpeach

Test results from a couple runs: ingress-conformance-run-1.tar.gz ingress-conformance-run-2.tar.gz

Tests fail intermittently with connection refused, my guess is that is b/c the address/status on the Ingress object is updated before Envoy listeners are actually updated (listeners were torn down from a previous test etc.)

~~It doesn't appear that tests are timing out but only 3 of the 5 features are being run~~ (Using presumably an older ingress-conformance container image)

Runs with the latest ingress-conformance image: ingress-conformance-run-3.tar.gz ingress-conformance-run-4.tar.gz

Failing features:

  • ingress class:
    • Scenario: An Ingress with an invalid ingress class should not send traffic to the matching backend service
      • error Then The Ingress status should not contain the IP address or FQDN waiting for Ingress status should not return an IP address or FQDN
      • We do not create an ingress class by default in our example deployment, the test creates an Ingress with an empty ingress class annotation (but a specified IngressClass field) and we are incorrectly handling it: https://github.com/kubernetes-sigs/ingress-controller-conformance/blob/79bd068cbb31d77c2a060f61eea795f9791a3d48/features/ingress_class.feature#L17
      • relevant issue: https://github.com/projectcontour/contour/issues/2146
  • default backend:
    • Some intermittent connection refused failures, seemingly nothing legitimate, default backends on Ingress seems to work correctly
    • Some intermittent 503s returned instead of 200s
  • host rules:
    • Scenario: An Ingress with a host rule should send TLS traffic to the matching backend service
      • Have not gotten this one to pass though it should, always get a connection refused error connecting to port 443
    • Scenario: An Ingress with a wildcard host rule should send traffic to the matching backend service
      • When I send a GET request to http://bar.foo.com, error Then the response status-code must be 200
      • expected status code 200 but 404 was returned
  • path rules:
    • Intermittent connection refused failures
    • Some of the 404 failures below seem like intermittent failures as well, they passed in a few previous runs
    • Scenario: An Ingress with exact path rules should send traffic to the matching backend service
      • fails with: expected status code 200 but 503 was returned
    • Scenario: An Ingress with exact path rules should not match requests with trailing slash
      • fails with: expected status code 404 but 200 was returned
    • Scenario: An Ingress with prefix path rules should ignore the request trailing slash and send traffic to the matching backend service
      • fails with: expected status code 200 but 404 was returned
    • Scenario: An Ingress with prefix path rules should and send traffic to the matching backend service
      • fails with expected status code 200 but 503 was returned
    • Scenario: An Ingress with prefix path rules should match each labels string prefix
      • fails with 200 instead of 404

sunjayBhatia avatar Dec 10 '20 18:12 sunjayBhatia

@sunjayBhatia IIRC at least some of these tests are for ingress V1, which isn't implemented in Contour.

jpeach avatar Dec 10 '20 22:12 jpeach

I see mention of a design doc here: https://github.com/projectcontour/contour/issues/2139, is this in flight/somewhere I can see/help with? cc @youngnick

sunjayBhatia avatar Dec 10 '20 22:12 sunjayBhatia

I updated #2139 with some further info, basically, the design for IngressClass is very similar to GatewayClass (the object), which is why I was initially suggesting I do them both. Happy to collaborate on something if you are interested @sunjayBhatia. Ping me and we can discuss in a high-bandwidth way somewhere.

youngnick avatar Dec 21 '20 23:12 youngnick

Relevant issue https://github.com/kubernetes-sigs/ingress-controller-conformance/issues/87 along with https://github.com/kubernetes-sigs/service-apis/pull/516

sunjayBhatia avatar Jan 12 '21 02:01 sunjayBhatia

I guess you’re working on this already @sunjayBhatia ? can you tag a release that you feel is appropriate?

xaleeks avatar Apr 13 '21 17:04 xaleeks

This seems like the long pole here since no-one is responding: https://github.com/kubernetes-sigs/ingress-controller-conformance/pull/88

sunjayBhatia avatar Apr 13 '21 18:04 sunjayBhatia

@sunjayBhatia, I'd recommend just pinging Bowei on Kubernetes Slack about this one, since it's a one-liner.

youngnick avatar Apr 19 '21 04:04 youngnick

Running ingress conformance is pretty consistently flaky, especially in CI since we create a kind cluster and deploy contour/envoy from scratch before running the tests, without any warming of the HTTP/HTTPS listeners

The ingress conformance tests wait for an Ingress to have its load balancer status address set before sending requests for the tests, however in our current architecture we have a goroutine separate from the Ingress processor that sets status on Ingress objects. This means the Ingress has an address possibly before it is programmed in Envoy and the tests move along accordingly, often sending requests that fail b/c Contour hasn't programmed Envoy yet or Envoy's listeners are still starting. The conformance suite also does not ever retry, it sends one request for each check.

Options here:

  • Change Contour to only set status once we have processed an Ingress
    • We don't have ACK/NACK so we don't really know when the Ingress is fully configured, but we could move the status updating out from the separate informer/goroutine to the ingress processor as a start
    • There isn't actually anything in the Ingress API as far as I can tell that says status has to be set only when the Ingress is "ready" or "admitted", which makes the fact we would have to do this only caused by the quirks/implicit requirements the test suite adds
  • Change tests to retry or wait longer for Ingress to be programmed in Envoy
    • see second sub-point for the above

sunjayBhatia avatar May 19 '21 19:05 sunjayBhatia

Seems like this is another victim of the underspecced nature of the Ingress API. If only there was some replacement that didn't have the same problems! 😝

More seriously, I until we have ACK support, we don't have any way to check if the programming is complete, and even if we did, there's no way to tell from the object (since there's no Conditions). So I think we'll have to just change the tests to retry some number of times or wait. Can you see what other implementations do for their tests here?

youngnick avatar May 20 '21 06:05 youngnick

A quick run through and I don't see anything in CI for other Ingress controllers, but I didn't do the most digging

sunjayBhatia avatar May 20 '21 19:05 sunjayBhatia

I think the most important thing is that Contour can pass the conformance tests eventually, once things are stable. So retries or longer timeouts are okay in my book.

youngnick avatar May 21 '21 00:05 youngnick

https://github.com/kubernetes-sigs/ingress-controller-conformance/issues/92

sunjayBhatia avatar May 24 '21 19:05 sunjayBhatia

waiting on upstream ^ will make a PR but the repo has not been very active so even if I do it probably won't make it for a while

should probably pull this out of 1.16

sunjayBhatia avatar May 25 '21 21:05 sunjayBhatia

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar Apr 19 '24 00:04 github-actions[bot]

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]