在liveness中配置/actuator/health,在应用请求nacos失败时,会导致pod异常重启
我们鼓励使用英文,如果不能直接使用,可以使用翻译软件,您仍旧可以保留中文原文。另外请按照如下要求提交相关信息节省社区维护同学的理解成本,否则该讨论极有可能直接被忽视或关闭。 We recommend using English. If you are non-native English speaker, you can use the translation software. We recommend using English. If you are non-native English speaker, you can use the translation software. In addition, please submit relevant information according to the following requirements to save the understanding cost of community maintenances, otherwise the discussion is very likely to be ignored or closed directly.
Which Component
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>2.3.12.RELEASE</version>
</dependency>
Describe the bug k8s应用在liveness中配置/actuator/health,应用请求nacos失败时,会导致pod异常重启,应用自身状态无异常,可以接受外部请求。
To Reproduce Steps to reproduce the behavior:
- 启动一个简单的SpringCloudAlibaba应用,并引入nacos config、discovery以及actuator依赖
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-config</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
- 访问http://localhost:8088/actuator/health
{
"status": "UP",
"components": {
"discoveryComposite": {
"status": "UP",
"components": {
"discoveryClient": {
"status": "UP",
"details": {
"services": [
"provider"
]
}
}
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 499963174912,
"free": 285202673664,
"threshold": 10485760,
"exists": true
}
},
"nacosConfig": {
"status": "UP"
},
"nacosDiscovery": {
"status": "UP"
},
"ping": {
"status": "UP"
},
"refreshScope": {
"status": "UP"
}
}
}
- 通过添加网络白名单策略,模拟应用与nacos之间断连的情况,再次访问http://localhost:8088/actuator/health
{
"status": "DOWN",
"components": {
"discoveryComposite": {
"status": "UP",
"components": {
"discoveryClient": {
"status": "UP",
"details": {
"services": []
}
}
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 499963174912,
"free": 284207394816,
"threshold": 10485760,
"exists": true
}
},
"nacosConfig": {
"status": "UP"
},
"nacosDiscovery": {
"status": "DOWN"
},
"ping": {
"status": "UP"
},
"refreshScope": {
"status": "UP"
}
}
}
- 此时actuator判断应用的健康状态为"DOWN",但实际上应用实际仍可以对外提供服务
Expected behavior 在SCA应用与nacos之间连接失败时,/actuator/health状态不应该为"DOWN"
Additional context SCA Version 2.2.9.RELEASE
In general, the "Liveness" state should not be based on external checks, such as Health checks. If it did, a failing external system (a database, a Web API, an external cache) would trigger massive restarts and cascading failures across the platform.
I think there's no need to change the implementations of NacosConfigHealthIndicator and NacosDiscoveryHealthIndicator, they both work well for telling actuator/health the status of Nacos.
Spring Boot has already provided Liveness and Readiness probes since 2.3.x, users should always use /actuator/health/liveness and /actuator/health/readiness for container lifecycle testing instead of using /actuator/health:
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#features.spring-application.application-availability
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#actuator.endpoints.kubernetes-probes
I think there's no need to change the implementations of NacosConfigHealthIndicator and NacosDiscoveryHealthIndicator, they both work well for telling
actuator/healththe status of Nacos.Spring Boot has already provided Liveness and Readiness probes since 2.3.x, users should always use
/actuator/health/livenessand/actuator/health/readinessfor container lifecycle testing instead of using/actuator/health:
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#features.spring-application.application-availability
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#actuator.endpoints.kubernetes-probes
Agree.
We need one article on sca.aliyun.com giving best practices for deploying Spring Cloud Alibaba on Kubernetes.
I think there's no need to change the implementations of NacosConfigHealthIndicator and NacosDiscoveryHealthIndicator, they both work well for telling
actuator/healththe status of Nacos.Spring Boot has already provided Liveness and Readiness probes since 2.3.x, users should always use
/actuator/health/livenessand/actuator/health/readinessfor container lifecycle testing instead of using/actuator/health:
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#features.spring-application.application-availability
- https://docs.spring.io/spring-boot/docs/3.2.1/reference/htmlsingle/#actuator.endpoints.kubernetes-probes
Thanks. Could you give some suggestions for users using spring boot version lower than 2.3.x?
This issue has been open 30 days with no activity. This will be closed in 7 days.
hi, @ZhXZhao I have written a best practice here. Do you have time to review?
https://github.com/yuluo-yx/sca-k8s-demo/tree/openfeign
hi, @ZhXZhao I have written a best practice here. Do you have time to review?
https://github.com/yuluo-yx/sca-k8s-demo/tree/openfeign
Okay, that's great