spec icon indicating copy to clipboard operation
spec copied to clipboard

[Proposal] Specification about application 's health

Open zeusro opened this issue 4 years ago • 3 comments

如果我的理解没有偏差的话,目前 OAM 是通过 Traits 来定义应用就绪的约定条件。

Background information

众所周知,以当前 kubernetes 作为大环境,应用以 docker container 形式运行在 ☁️ 之上。而目前就绪探测(readinessProbe)定义的三种形式在我看来都有问题。

  1. tcpSocket 和 httpGet 都过于简单粗暴;
  2. exec 比较hack,普遍性不强,而且还得把 bash ”藏“在镜像里面;

What‘s healthy web application ?

上文提到tcpSocket 和 httpGet 都过于简单粗暴。举个例子,假设 Java spring boot web 应用 A,它监听8080端口,有个健康检查接口,路由为/health。随着时间的推移,内存逐渐泄露。最后导致FGC越来越慢。 按传统 kubernetes livenessProbe/readinessProbe 的定义,从基础设施上看,它依旧算是一个健康的APP; 但从业务角度,它已经是一个拖后腿的APP,理应重构或者重新部署——万事不决靠重启,人散鸟飞方重构

My idea

在 kubernetes 之外,需要对 web 应用定义一种健康状态的规范。 依旧以那个 Java web server 为例。它的就绪可能需要满足一些前置的应用条件(比如sidecar container ready),除此之外,它自身除了端口可用,健康检查接口 /health 的响应时间如果过长,也应当判定其状态不正常。

应用的健康检查还依赖于 metrics data,这些最好能作为规范沉淀在 OAM 这个项目里面。

Link:

  1. https://kubernetes.io/zh/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
  2. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.14/#probe-v1-core

zeusro avatar May 09 '20 06:05 zeusro

Interesting idea, so you're suggesting to be able to define a latency boundary and if it's longer than that it's considered unhealthy?

tomkerkhove avatar May 11 '20 07:05 tomkerkhove

Interesting idea, so you're suggesting to be able to define a latency boundary and if it's longer than that it's considered unhealthy?

Currently, we use wait-for-it to wait for the sidecar container ready.

And I have already heard of Sidecar container lifecycle changes in Kubernetes 1.18

image

But consider for the fact,the reality is much more complicated than that.

So, Alibaba Group created the PouchContainer project to promote the container technology movement.

🤣 I think OAM could do better than PouchContainer

zeusro avatar May 12 '20 02:05 zeusro

It's called healthz or "z-pages" in Google and there are several OSS implementations in the community, please check: https://stackoverflow.com/questions/43380939/where-does-the-convention-of-using-healthz-for-application-health-checks-come-f

The convention of healthz is not in scope of OAM, but you are right, OAM trait can definitely support it if there's a standard somewhere.

resouer avatar May 22 '20 04:05 resouer