paasta
paasta copied to clipboard
dar-1636: handle envoy admin timeout
Hmmm, not sure if handling it that way is ideal
We should probably show the user that a particular host is not behaving nicely, instead of logging internally and just returning a potentially incomplete backends list
Hmmm, not sure if handling it that way is ideal
We should probably show the user that a particular host is not behaving nicely, instead of logging internally and just returning a potentially incomplete backends list
I dont think knowing that a particular host is bad is not really helpful for an enduser, hence I'm just kinda logging the error and moving on there. The error should be reported to us either by an active healthcheck checking envoy, or a splunk query that checks for this
Wouldn't that impact the information reported by paasta itself though, potentially skipping pods/instances that should be shown?
Well it wouldn't skip anything, it would just not crash if an envoy instance is timing out, and I dont think it makes sense to show that error to users as its not actionable for them, the only thing they can do is reach out to us, where we should have direct monitoring for this instead
Cleaning up and closing some very old PRs. Please re-open or nudge me if you’re still planning to work on this.