cannot delete persistent instance ,response all ‘OK’
Describe the bug 调用delete /nacos/v1/ns/instance删持久化实例,响应是200但删除失败。用户发现问题后,我又用同样的client调用了2次接口,10:59:41的第二次请求删除成功了。11:02:00的第三次请求,看到naming-server的日志,也能证明第二次请求成功。 2022-07-26 11:02:00,420 WARN remove instance from non-exist client: 61.147.184.72:18741#false
Call delete /nacos/v1/ns/instance to delete the persistent instance, the response is 200 but the deletion fails. After finding the problem, I called the interface twice with the same client, and the request at 10:59:41 was deleted successfully. The request at 11:02:00 and seeing the log of the naming-server can also prove that the second request was successful 2022-07-26 11:02:00,420 WARN remove instance from non-exist client: 61.147.184.72:18741#false
access日志如下
The access log is as follows
10.201.110.5 - - [26/Jul/2022:10:38:49 +0800] "DELETE /nacos/v1/ns/instance?app=unknown&namespaceId=prod&port=18741&clusterName=China&ip=x.x.x.x&ephemeral=false&serviceName= bservice&encoding=UTF-8&nofix=1 HTTP/1.1" 200 2 61 Nacos-Server:2.0.3 10.201.110.5:15432 10.200.110.8 - - [26/Jul/2022:10:59:41 +0800] "DELETE /nacos/v1/ns/instance?app=unknown&namespaceId=prod&port=18741&clusterName=China&ip=x.x.x.x&ephemeral=false&serviceName= bservice&encoding=UTF-8&nofix=1 HTTP/1.1" 200 2 64 Nacos-Server:2.0.3 10.200.110.8:15432 58.248.226.14 - - [26/Jul/2022:11:02:00 +0800] "DELETE /nacos/v1/ns/instance?app=unknown&namespaceId=prod&port=18741&clusterName=China&ip=x.x.x.x&ephemeral=false&serviceName=bservice HTTP/1.1" 200 2 2 Nacos-Java-Client:v2.0.3 -
在这个时间段,jraft日志没找到异常,naming-raft.log无日志,protocol-distro.log无相关异常。没找到异常日志,没排查思路。第一次删除后,在页面上看到这个持久化实例还在(只是pod下线了所以不健康)。订阅者nacosSync在10:38:50.162收到的NamingEvent,要删除的这个实例还是在的,"healthy":false,"enabled":true,"ephemeral":false。
During this time period, no exception was found in the jraft log, no log in the naming-raft.log, and no related exception in the protocol-distro.log. No abnormal log was found, and no troubleshooting ideas. After the first deletion, I see on the page that the persistent instance is still there (just the pod is offline so it is not healthy). The NamingEvent received by subscriber nacosSync at 10:38:50.162, the instance to be deleted is still there, "healthy":false,"enabled":true,"ephemeral":false.
Desktop (please complete the following information):
- OS: [Centos]
- Version [nacos-server 2.0.3, nacos-client 2.0.3]
- Module [naming]
Confirm that the service is not re enabled due to heartbeat 确认并非由心跳导致服务重新启用
Nacos does not enable authentication, and all interfaces return OK nacos并未启用认证,并且接口均返回OK
// 待参考#8653,是1.3.1发现的问题,已合并到2.1.1
@hujun-w-2 是不是类似你提的那个问题?
2022-07-26 11:02:00,420 WARN remove instance from non-exist client: 61.147.184.72:18741#false
看日志应该已经被删除掉了。
看一下实例删除后是否有健康检查成功的日志,如果有就是健康检查导致的实例被重新注册,一般发生在实例存活的情况下删除了实例的情况,naming-server和naming-event日志中看看
naming-server日志找到
2022-07-26 10:38:49,581 INFO Positioning inconsistency remove Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=213}, 61.147.184.72:18741#false 2022-07-26 10:38:49,581 INFO Client remove for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=213}, 61.147.184.72:18741#false 2022-07-26 10:38:49,941 INFO Positioning inconsistency add Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=214}, 61.147.184.72:18741#false 2022-07-26 10:38:49,941 INFO Client change for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=214}, 61.147.184.72:18741#false 2022-07-26 10:59:41,380 INFO Positioning inconsistency remove Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=224}, 61.147.184.72:18741#false 2022-07-26 10:59:41,380 INFO Client remove for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=224}, 61.147.184.72:18741#false
naming-event日志没找到关键词,如果是健康检查实例被重新注册,应该会打印HealthCheckCommonV2的104行吧

curl /nacos/v1/ns/upgrade/ops/metrics upgraded = true isAll20XVersion = true isDoubleWriteEnabled = fals
可以看看naming-event在实例删除时候有没有这个日志tcp:ok,如果有说明是健康检查导致的
处理delete请求的节点没有滚动日志查不了当天的,其他节点的naming-event当天10点到11点时段的tcp:ok都没有这个实例的服务名和ip
有没有 61.147.184.72:18741#false disconnection 的日志?
有没有 61.147.184.72:18741#false disconnection 的日志?
日志滚动刷走了。查了nacos 2.0.3的代码,都没有打印'disconnection'的哦,jraft日志也没disconnection关键词
那搜一下有没有61.147.184.72:18741#false remove service 类似的日志
那搜一下有没有61.147.184.72:18741#false remove service 类似的日志
有的,上面就有
naming-server日志找到
2022-07-26 10:38:49,581 INFO Positioning inconsistency remove Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=213}, 61.147.184.72:18741#false 2022-07-26 10:38:49,581 INFO Client remove for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=213}, 61.147.184.72:18741#false 2022-07-26 10:38:49,941 INFO Positioning inconsistency add Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=214}, 61.147.184.72:18741#false 2022-07-26 10:38:49,941 INFO Client change for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=214}, 61.147.184.72:18741#false 2022-07-26 10:59:41,380 INFO Positioning inconsistency remove Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=224}, 61.147.184.72:18741#false 2022-07-26 10:59:41,380 INFO Client remove for service Service{namespace='prod', group='web', name='bservice', ephemeral=false, revision=224}, 61.147.184.72:18741#falsenaming-event日志没找到关键词,如果是健康检查实例被重新注册,应该会打印HealthCheckCommonV2的104行吧
curl /nacos/v1/ns/upgrade/ops/metrics upgraded = true isAll20XVersion = true isDoubleWriteEnabled = fals
那其实说明删除确实是删除完成了。是说删除之后控制台上还能看到,还是应用还在调用这个ip?
那其实说明删除确实是删除完成了。是说删除之后控制台上还能看到,还是应用还在调用这个ip? 应用还在调用这个ip,然后才通知到我这边排查。排查发现删除之后(access log找到删除请求)控制台上还能看到。 我升级版本,或者按 #8653 改代码试试。
If reproduce this problem again, please submit an new issue and provide related logs. Thanks for your issue.