servicecomb-service-center icon indicating copy to clipboard operation
servicecomb-service-center copied to clipboard

service center 2.1.0 的实例注销接口失效, 并且心跳超时机制不符合预期

Open yhs0092 opened this issue 1 year ago • 1 comments

Describe the bug Service Center 2.1.0 版本的实例注销接口失效, 导致Java-Chassis框架开发的微服务做完优雅下线后, 实例还能从sc中查询得到. 并且如果将微服务的心跳时间间隔调短, sc的心跳失败自动下线实例的机制并不会对应缩短下线实例的时间, 导致业务仍然需要等待约 120 秒才会看到实例记录消失.(等待120秒下线心跳一直失败的实例是默认配置下的行为)

To Reproduce

  1. service center 2.1.0release note中下载linux-amd64版本的软件包, 并部署.
  2. 用Java-Chassis框架开发一个微服务注册到sc, 然后停止微服务, 触发Java-Chassis框架的优雅下线机制.
  3. 调用 sc 的接口, 查询实例, 观察实例记录在什么时候消失.

Expected behavior 预期在 Java-Chassis 打出日志表示实例已注销之后, 就无法从 sc 查到对应的实例, 但是实际上实例消失的时间要远远晚于实例注销的时间点. (从 sc-frontend 页面上看也是这样)

  • 问题现象1:

    Java-Chassis框架注销实例记录的日志显示为 17:43:47, 但是我使用shell脚本编写一个 while 循环, 每秒钟查询一次这个实例, 一直等到 17:45:44 实例才消失, 说明 Java-Chassis 调用实例注销接口并没有效果.

  • 问题现象2:

    注意看截图查出来的实例记录数据, 我配置了 healthCheck.interval = 5, healthCheck.time = 3, 理论上讲最迟心跳失败 20 秒后实例应该就会由于连续心跳失败而下线, 但实际上 sc 等了 118 秒才下线实例, 这个结果接近于默认的心跳配置(healthCheck.interval = 30, healthCheck.time = 3), 也就是似乎Java-Chassis配置了心跳失败间隔也没生效?

Java-Chassis框架的优雅停机日志: image

sc的接口查询记录(使用脚本while [[ true ]]; do date && curl 'http://localhost:30100/v4/default/registry/instances?appId=nuwa-sdk-benchmark&serviceName=edge&global=true&version=0.0.0.0%2B&env=development' && echo '' && sleep 1 ; done):

Sat Jul  1 17:45:43 CST 2023
{"instances":[{"instanceId":"495ecf26ec904f07b4ad67c4ec3af28a","serviceId":"67cc7611960001018d36ef95288fd803ce35a2d7","endpoints":["rest://127.0.0.1:31000"],"hostName":"wuhpnuwa000002","status":"DOWN","healthCheck":{"mode":"push","interval":5,"times":3},"timestamp":"1688204588","modTimestamp":"1688204627","version":"3.0.0.101"}]}
Sat Jul  1 17:45:44 CST 2023
{"instances":[{"instanceId":"495ecf26ec904f07b4ad67c4ec3af28a","serviceId":"67cc7611960001018d36ef95288fd803ce35a2d7","endpoints":["rest://127.0.0.1:31000"],"hostName":"wuhpnuwa000002","status":"DOWN","healthCheck":{"mode":"push","interval":5,"times":3},"timestamp":"1688204588","modTimestamp":"1688204627","version":"3.0.0.101"}]}
Sat Jul  1 17:45:45 CST 2023
{}
Sat Jul  1 17:45:47 CST 2023
{}
Sat Jul  1 17:45:48 CST 2023
{}
Sat Jul  1 17:45:49 CST 2023
{}

Platform And Runtime (please complete the following information):

使用 2.1.0 版本 Service Center, 使用 Java-Chassis 1.3.11 开发微服务.

SC版本:

$ curl 'http://localhost:30100/v4/default/registry/version'
{"version":"2.1.0","buildTag":"20220314220818.2.1.0.9ecab25a","goVersion":"go1.15.1","os":"linux","arch":"amd64","apiVersion":"4.0.0"

Additional context

我尝试过维持 Java-Chassis 微服务一直运行, 然后额外调用 sc 的实例注销接口, 发现 Java-Chassis 框架会报心跳失败, 没有实例. 而且查询实例 日志如下:

2023-07-01 17:43:03.396 [Service Center Task [1]] WARN  - [ServiceRegistryClientImpl.java:heartbeat:652] - [] - Bad Request
2023-07-01 17:43:03.396 [Service Center Task [1]] ERROR - [MicroserviceInstanceHeartbeatTask.java:heartbeat:79] - [] - Update heartbeat to service center failed, microservice instance=67cc7611960001018d36ef95288fd803ce35a2d7/495ecf26ec904f07b4ad67c4ec3af28a does not exist
2023-07-01 17:43:03.397 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onMicroserviceInstanceHeartbeatTask:76] - [] - read MicroserviceInstanceHeartbeatTask status is READY
2023-07-01 17:43:03.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:onMicroserviceInstanceHeartbeatTask:61] - [] - read MicroserviceInstanceHeartbeatTask status is READY
2023-07-01 17:43:08.396 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:doRegister:78] - [] - running microservice register task.
2023-07-01 17:43:08.396 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:doRegister:86] - [] - Microservice exists in service center, no need to register. id=[67cc7611960001018d36ef95288fd803ce35a2d7] appId=[nuwa-sdk-benchmark], name=[edge], version=[3.0.0.101], env=[development]
2023-07-01 17:43:08.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:checkSchemaIdSet:149] - [] - SchemaIds are equals to service center. serviceId=[67cc7611960001018d36ef95288fd803ce35a2d7], appId=[nuwa-sdk-benchmark], name=[edge], version=[3.0.0.101], env=[development], schemaIds=[metricsEndpoint, healthEndpoint]
2023-07-01 17:43:08.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:registerSchema:194] - [] - schemaId [metricsEndpoint] exists [true], summary exists [true]
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:registerSchema:194] - [] - schemaId [healthEndpoint] exists [true], summary exists [true]
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onRegisterTask:64] - [] - read MicroserviceRegisterTask status is FINISHED
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [MicroserviceInstanceRegisterTask.java:doRegister:59] - [] - running microservice instance register task.
2023-07-01 17:43:08.406 [Service Center Task [1]] INFO  - [MicroserviceInstanceRegisterTask.java:doRegister:81] - [] - Register microservice instance success. microserviceId=67cc7611960001018d36ef95288fd803ce35a2d7 instanceId=495ecf26ec904f07b4ad67c4ec3af28a endpoints=[rest://127.0.0.1:31000] lease 20s
2023-07-01 17:43:08.407 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onRegisterTask:64] - [] - read MicroserviceInstanceRegisterTask status is FINISHED
2023-07-01 17:43:08.407 [Service Center Task [1]] INFO  - [MicroserviceInstanceStatusSyncTask.java:onMicroserviceRegisterTask:40] - [] - start synchronizing instance status

心跳接口能报失败, 但是用v4.yaml接口里的 find 接口却能查到实例: https://github.com/apache/servicecomb-service-center/blob/master/docs/openapi/v4.yaml#L1165

就像是 sc 的接口内部数据不一致一样.

yhs0092 avatar Jul 01 '23 10:07 yhs0092