contour icon indicating copy to clipboard operation
contour copied to clipboard

Add metrics to indicate when Contour is connecting to an unsupported Envoy

Open stevesloka opened this issue 6 years ago • 12 comments

Describe the solution you'd like We've run into several users who run on master (which isn't a good supported idea), but also get versions of Contour<==>Envoy mixed up when upgrading. Sometimes this is ok, but other times a specific version of Contour will only support a specific version of Envoy leaving users with cryptic failure messages in Envoy's logs.

It would be neat if when Envoy started it could check with Contour that the versions are supported, if not Envoy should block on starting. I think if this was implemented in the readiness probe, should someone do an upgrade to Contour without upgrading Envoy, the envoy container would block a rollout since it would fail the probe and shouldn't affect normal traffic.

stevesloka avatar Apr 09 '19 00:04 stevesloka

Thank you for raising this issue. At the moment Contour and Envoy are in lock step, you cannot use a version older than what Contour expects and you shouldn't use a version newer than what Contour expects.

This could perhaps be mitigated if we find a solution to #952

I'm not sure how to make Envoy check the contour version, xDS doesn't have an notion of a server version.

davecheney avatar Apr 09 '19 00:04 davecheney

I haven't looked too much into how to implement, but I think we'd need another container to be the healthchecker container. It can look at envoy's version which is available in its admin page. Contour would need to expose a /version or something to let this healthcheck container query for.

This would need more thought to make it better, the above solution is just a quick thought.

stevesloka avatar Apr 09 '19 00:04 stevesloka

@stevesloka do you think this is possible before Contour 1.0? If so, could you please move it to the appropriate milestone. If we can live without this til after 1.0, please move it to the unplanned milestone and I'll revisit it when we get closer to planning a 1.1 release.

davecheney avatar Jun 20 '19 04:06 davecheney

@davecheney I'm going to stick to design first to determine how to progress, added to v0.15.0 milestone.

stevesloka avatar Jun 20 '19 16:06 stevesloka

Thanks @stevesloka. Let's talk more about this when we get to 0.15. I'm not sure if propogating the envoy admin page is going to work in all deployment scenarios; i'm thinking about what happens when envoy and contour aren't in the same pod. But we might be able to set up a special listener on envoy and ask it to return it's full Server: string -- something we normally suppress. That might be a way of getting the version number without having to expose the admin interface.

Let's talk more in July.

davecheney avatar Jun 20 '19 22:06 davecheney

@stevesloka do you have any suggestions on how we could implement this? If nothing comes to mind would you consider moving this to the backlog milestone and we'll revisit after Contour 1.0

davecheney avatar Aug 29 '19 06:08 davecheney

@davecheney I'm going to backlog this for now.

stevesloka avatar Aug 29 '19 13:08 stevesloka

Thanks. Lets revisit it in November

On Thu, 29 Aug 2019 at 23:07, Steve Sloka [email protected] wrote:

@davecheney https://github.com/davecheney I'm going to backlog this for now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/heptio/contour/issues/995?email_source=notifications&email_token=AAABYA5TYULEPJOFD6QIPETQG7CZLA5CNFSM4HENN4IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5ONDTY#issuecomment-526176719, or mute the thread https://github.com/notifications/unsubscribe-auth/AAABYA6YAWJH5NKS3XR5C2DQG7CZLANCNFSM4HENN4IA .

davecheney avatar Aug 30 '19 10:08 davecheney

Bringing this back now, can we implement a controller in our operator for checking compatibility? Imagine it would be useful for when upgrading through the operator

xaleeks avatar Feb 03 '21 21:02 xaleeks

If there is a simple way for Contour to check that Envoys that connect to it are the supported version, that would be great.

We would only be able to log or increment a metric or something though, or else you would never be able to upgrade. But if there was a metric, the operator could check that the Envoy version is supported as part of its readiness checks.

youngnick avatar Feb 04 '21 04:02 youngnick

Sounds good, we’ll leave it in parking lot1 if someone wants to pick this up

xaleeks avatar Mar 02 '21 17:03 xaleeks