tilt icon indicating copy to clipboard operation
tilt copied to clipboard

Cluster health check failure can get stuck

Open milas opened this issue 2 years ago • 5 comments

Expected Behavior

  • If the cluster becomes unhealthy and then healthy again, Tilt reflects that both in the cluster pop-up in the UI and by "unholding" any resources waiting for cluster

Current Behavior

  • Possible for health check to get stuck in a failing state

Steps to Reproduce

This is a recent feature and we've only had this reported once via Slack, but the error was showing an error on the /livez check, and the user reported that request was succeeding via curl at that point.

Screen Shot 2022-04-22 at 9 47 08 AM

They'd mentioned getting into the state after having put their laptop to sleep for the day and returning the next morning.

milas avatar Apr 25 '22 13:04 milas

a few more developers of ours saw this recently - it would be nice to improve this because getting stuck here seems like a regression caused by the (otherwise great) health check functionality being added

andymartin-sch avatar May 06 '22 13:05 andymartin-sch

@andymartin-sch Thanks for the extra reports - agreed this is not the experience we want here; I'm hoping to include at least some form of remediation in our release today.

In the cases you've seen, has the error shown in the Tilt UI been similar to that in the issue above? If so, do you know if anyone tried manually accessing the endpoint (e.g. curl https://..../livez) and whether that was successful?

milas avatar May 06 '22 13:05 milas

In the cases you've seen, has the error shown in the Tilt UI been similar to that in the issue above?

yeah pretty much the exact same

If so, do you know if anyone tried manually accessing the endpoint (e.g. curl https://..../livez) and whether that was successful?

I don't think so but we can do that going forward and will let you know - thanks!!

andymartin-sch avatar May 06 '22 13:05 andymartin-sch

ah, one developer just said:

When I hit this, I went to that endpoint in my browser and it returned “ok”

andymartin-sch avatar May 06 '22 13:05 andymartin-sch

A couple improvements/fixes went into v0.29.0 (released May 6) - please let me know if you still see the issue after upgrading!

milas avatar May 09 '22 14:05 milas