envoy icon indicating copy to clipboard operation
envoy copied to clipboard

configs: Fix default config

Open phlax opened this issue 4 years ago • 21 comments
trafficstars

Signed-off-by: Ryan Northey [email protected]

Commit Message: configs: Fix default config Additional Description:

rather than proxying to the envoy website which often responds with a redirect this updates the default config to just post a link to the website and docs

Risk Level: Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Deprecated:] [Optional API Considerations:]

phlax avatar Jul 12 '21 16:07 phlax

cc @mattklein123 @alyssawilk

phlax avatar Jul 12 '21 16:07 phlax

lgtm, //test/config_test:example_configs_test is failing due to changes in cluster config.

dmitri-d avatar Jul 12 '21 18:07 dmitri-d

//test/config_test:example_configs_test is failing due to changes in cluster config

i thought i had fixed it, but obviously not, ill check again...

phlax avatar Jul 12 '21 18:07 phlax

hmm, i see it wants a cluster to be configured

im wondering if its valid to have 0 clusters configured, certainly Envoy starts/works with the config in the PR

phlax avatar Jul 12 '21 18:07 phlax

Are you still having trouble after the scheme header landed? If so I can try to help debug

yep

in my pr #16899 it loads the default config loads of times and does a grep test to check the server is responding with "default" output - ie envoy website content

the failure is a bit wierd - it always fails on the arm build, it usually fails on the x86 build, it very rarely (but it happens) fails on only some on the x86 build

testing locally it hardly ever fails, but does some times

ive come to the conclusion that there is some caching going on and perhaps different netlify endpoints respond differently accordingly

i havent tested with google again since you landed the scheme mangling but previously it was similar (i didnt test extensively)

there is another pr here - to test it in isolation (which is flakey) #17200

here is an example of it failing on azp - https://dev.azure.com/cncf/envoy/_build/results?buildId=81664&view=logs&j=f1a49366-51c0-540f-0294-f73c2bb311a9&t=7fd036c9-61ec-5b59-3645-e90950dc67a6&l=1760

its probs not that helpful as it doesnt show you the response/code but its trying to redirect

phlax avatar Jul 13 '21 19:07 phlax

there is value in having an example of Envoy forwarding to another website.

what about we add a loopback proxy listener in the config

i quite like the idea that the example will work without a network connection, altho its a bit "Envoy is a proxy, but thats not what we are going to show you" i guess if it doesnt proxy at all

the other thing is dealing with historical versions which have the same problem (worse afaict) - this pr would be ~easy to backport i think

phlax avatar Jul 13 '21 21:07 phlax

heres a more helpful failure on the packaging pr https://dev.azure.com/cncf/envoy/_build/results?buildId=82041&view=logs&j=7d299c4a-9a35-59b9-677d-f6996e8fc292&t=2e3bbaf1-9c4a-5c1a-ee33-a8cc5407b4d9&l=5227

you can see the logs and response there

phlax avatar Jul 14 '21 09:07 phlax

Didn't squeeze it in this week but I've blocked off some time Monday to check it out.

alyssawilk avatar Jul 15 '21 20:07 alyssawilk

i added a loopback, so there is a proxy in this example now

phlax avatar Jul 17 '21 11:07 phlax

Looks like CI is still unhappy? I'm fine with envoy_demo being a local reply, but I think the point of *.proxy.yaml is to provide an example of proxying data from one place to another. I think we'd be better off switching websites to not-really-proxying towards envoyproxy.io

/wait

alyssawilk avatar Jul 19 '21 21:07 alyssawilk

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Aug 19 '21 00:08 github-actions[bot]

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Aug 26 '21 04:08 github-actions[bot]

@mattklein123 @alyssawilk

i tested again today to see whether any changes at netlify had resolved the problem that this addresses, unfortunately it doesnt seem to have:

PackagesDistroChecker ERROR [distros] [debian_bullseye/envoy-1.20:proxy-responds] Test failed
Envoy did not respond correctly
Response was
Redirecting to https://www.envoyproxy.io/

not sure about the best way forward, but im thinking these are the options:

  • fix whatever is breaking with this pr, and land this solution
  • figure out what headers (?) are being sent to netlify that means sni is not working as expected
  • accept that http -> https is probs not the best pattern to use in the default demo/config and figure out an alternative to proxy to (i guess reverting to google is an option, but im pretty sure ive seen it display similar redirect issues)

one way or another i kinda need to be able to test that envoy is responding correctly with the default config - the current situation means that it is 50/50 whether you get a redirect or the actual page

im also concerned that the problems netlify have had with redirect loops are caused by the default config in the docker image causing it to cache a redirect, and then that gets served to others - i may be wrong about this, but it is consistent with my testing.

personally, i quite like the solution here, and im guessing it wont be too hard to fix ci, but im really open to whatever we can come up with to fix the default config

phlax avatar Sep 07 '21 12:09 phlax

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/17296 was synchronize by phlax.

see: more, trace.

ci should be fixed

phlax avatar Sep 07 '21 13:09 phlax

@alyssawilk @mattklein123 this is no longer required for my current work as the netlify issues appears to be fixed - so im happy to close

that said, i quite like the idea of the default config not requiring a network connection, especially in the context of an OS/package install

im also left wondering - if my theory that our default config was the cause of the redirect loop is correct - what does envoy's sni implementation expose that leaks the information about http that i guess is causing the redirect - my assumption is that an sni request originating from either http or https should look the same to the upstream, but maybe this is wrong

phlax avatar Sep 08 '21 11:09 phlax

I find having a default config for a proxy actually do proxying more useful, so I'd be inclined to close this off than land it.

upstream http vs https should be exposed via x-forwarded-proto (though our default config overrides that AFIK) but that wasn't the bug with netlify because it was happening with Chrome and an https url which wouldn't be doing x-forward-proto: http.. So I am baffled about the netlify issue but glad it resolved itself!

alyssawilk avatar Sep 08 '21 13:09 alyssawilk

alternately if you want best of both worlds, we could try to hit up envoyproxy.io and have a custom error rewrite handler which notes lack of network connectivity and sends a canned response? /wait

alyssawilk avatar Sep 08 '21 13:09 alyssawilk

but that wasn't the bug with netlify because it was happening with Chrome and an https url which wouldn't be doing x-forward-proto: http.

i believe due to caching

it was on the days where i was triggering the problem that others (re-)reported the browser issue

(hence the redirect loop - ie http -> https request happens, redirect is cached for https response, new https request happens, fail)

phlax avatar Sep 08 '21 14:09 phlax

alternately if you want best of both worlds, we could try to hit up envoyproxy.io and have a custom error rewrite handler which notes lack of network connectivity and sends a canned response?

wfm - ill update as i have some time

phlax avatar Sep 08 '21 14:09 phlax

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Oct 08 '21 16:10 github-actions[bot]

this is still an issue as for CI (and i guess end users) as the website can still be flakey

phlax avatar Aug 12 '22 10:08 phlax

we could have default config be to proxy and retry config serve from cache? I wouldn't have CI test that we can reach envoyproxy.io. but I'd fix CI to not e2e test the internet, rather than replace a useful config.

alyssawilk avatar Aug 30 '22 13:08 alyssawilk

we could have default config be to proxy and retry config serve from cache?

i guess that would be possible - iiuc then adding adding a warm cache to prevent ci from testing web connect

in this case we are testing what an end user is getting in the package - what CI is demonstrating is in a way end user experience - so i think there are 2 separate issues - website flakiness, and the config

re the config my current thought - altho i havent had time to pursue it was something like - a static page served by default with a proxied link to the website on a separate path/port - that way the default page is at least not proxying to web etc immediately but at the same time still demonstrates a proxy as part of the default config

phlax avatar Aug 30 '22 14:08 phlax