bk-cmdb
bk-cmdb copied to clipboard
开源CMDB(3.9.37)替换社区版(3.9.27)失败
问题描述
cmdb由社区版v3.9.27升开源版v3.9.37,升级之后服务异常。
重现方法
按照此文档操作:如何用开源CMDB替换社区版。(没有重新生成配置文件,用的旧conf)
关键信息
请提供以下信息:
- [x] bk-cmdb 版本:
Version : v3.9.37
Tag : ocr-v3.9.37
BuildTime : 2022-08-25T01:34:30+0800
GitHash : v3.9.37
RunMode : product
Distribution: community
ServiceName : apiserver
- [x] Redis 版本: v5.0.9
- [x] MongoDB 版本: v4.2.3
- [x] ZooKeeper 版本: v3.4.14
- [x] 操作系统 : CentOS 7(x64)
- [x] bk-cmdb 异常日志 bkcli check cmdb:
[1] 10:57:51 [FAILURE] 192.168.0.103 Exited with error code 3
cmdb-admin(http://192.168.0.103:9000/healthz): true
cmdb-api(http://192.168.0.103:9001/healthz) : false Reason: {"code":0,"ok":false,"message":"api server is unhealthy","data":{"module":"apiserver","address":"","healthy":false,"message":"api server is unhealthy","items":[{"name":"servicediscover","healthy":true,"message":""},{"name":"topo","healthy":true,"message":""},{"name":"host","healthy":true,"message":""},{"name":"proc","healthy":true,"message":""},{"name":"eventserver","healthy":true,"message":""},{"name":"datacollection","healthy":false,"message":"oops, there is no datacollection can be used"},{"name":"operation","healthy":true,"message":""},{"name":"task","healthy":true,"message":""},{"name":"cloud","healthy":true,"message":""}],"at_time":"2022-08-25 02:57:50","version":{"commit_id":"v3.9.37","time":"2022-08-25T01:34:30+0800","version":"v3.9.37"}},"result":false}
cmdb-auth(http://192.168.0.103:9002/healthz) : true
cmdb-cache(http://192.168.0.103:9014/healthz): true
cmdb-cloud(http://192.168.0.103:9003/healthz): true
cmdb-core(http://192.168.0.103:9004/healthz) : true
cmdb-datacollection(http://192.168.0.103:9005/healthz): false Reason: connection refused
cmdb-event(http://192.168.0.103:9006/healthz): true
cmdb-host(http://192.168.0.103:9007/healthz) : true
cmdb-operation(http://192.168.0.103:9008/healthz): true
cmdb-proc(http://192.168.0.103:9009/healthz) : true
cmdb-task(http://192.168.0.103:9011/healthz) : true
cmdb-topo(http://192.168.0.103:9012/healthz) : true
cmdb-web(http://192.168.0.103:9013/healthz) : false Reason: {"code":0,"ok":false,"message":"web server is unhealthy","data":{"module":"webserver","address":"","healthy":false,"message":"web server is unhealthy","items":[{"name":"servicediscover","healthy":true,"message":""},{"name":"apiserver","healthy":false,"message":"api server is unhealthy"}],"at_time":"2022-08-25 02:57:51","version":{"commit_id":"v3.9.37","time":"2022-08-25T01:35:04+0800","version":"v3.9.37"}},"result":false}
apiserver.ERROR:
E0825 11:02:00.050819 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/app/search/0] , err: Post "http://192.168.0.103:9012/topo/v3/app/search/0": EOF, rid: 74c67ba02441434a8f5171667151eb53
E0825 11:02:00.280291 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/findmany/cloudarea] , err: Post "http://192.168.0.103:9007/host/v3/findmany/cloudarea": EOF, rid: f3e9295781de4111ac7f3a502da43950
E0825 11:02:01.398898 479 service/http.go:82] *failed do request[GET url: http://192.168.0.103:9012/topo/v3/app/with_reduced?sort=bk_biz_id] , err: Get "http://192.168.0.103:9012/topo/v3/app/with_reduced?sort=bk_biz_id": dial tcp 192.168.0.103:9012: connect: connection refused, rid: cc0000cc3ecad50lvg35u8lpag
E0825 11:02:01.404576 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/usercustom/user/search] , err: Post "http://192.168.0.103:9007/host/v3/usercustom/user/search": dial tcp 192.168.0.103:9007: connect: connection refused, rid: cc0000cc3ecad50lvg35u8lpc0
E0825 11:02:01.404652 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/find/classificationobject] , err: Post "http://192.168.0.103:9012/topo/v3/find/classificationobject": dial tcp 192.168.0.103:9012: connect: connection refused, rid: cc0000cc3ecad50lvg35u8lpbg
E0825 11:02:01.406450 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/usercustom/default/model] , err: Post "http://192.168.0.103:9007/host/v3/usercustom/default/model": dial tcp 192.168.0.103:9007: connect: connection refused, rid: cc0000cc3ecad50lvg35u8lpcg
E0825 11:03:00.083384 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/app/search/0] , err: Post "http://192.168.0.103:9012/topo/v3/app/search/0": EOF, rid: a9c1bf674d0046288878e96684f8a213
E0825 11:03:00.337323 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/findmany/cloudarea] , err: Post "http://192.168.0.103:9007/host/v3/findmany/cloudarea": EOF, rid: 89b1ef9767224eb3a11308e6982ff2e1
E0825 11:04:00.056175 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/app/search/0] , err: Post "http://192.168.0.103:9012/topo/v3/app/search/0": EOF, rid: dc6af2367eec4be2840f335582334d3c
E0825 11:04:00.393054 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/findmany/cloudarea] , err: Post "http://192.168.0.103:9007/host/v3/findmany/cloudarea": EOF, rid: 64d2095734654415a33a2e5ad16ef394
E0825 11:05:00.064368 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/find/topoinst/biz/2] , err: Post "http://192.168.0.103:9012/topo/v3/find/topoinst/biz/2": EOF, rid: 0e35d7e3c52c455db20c54a57a815e53
E0825 11:05:00.064566 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/app/search/0] , err: Post "http://192.168.0.103:9012/topo/v3/app/search/0": read tcp 192.168.0.103:51700->192.168.0.103:9012: read: connection reset by peer, rid: 336e9c79fc24409888d80b0b23d80613
E0825 11:05:00.092594 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/find/topoinst/biz/2] , err: Post "http://192.168.0.103:9012/topo/v3/find/topoinst/biz/2": dial tcp 192.168.0.103:9012: connect: connection refused, rid: 06b86ab78de946259c0f6629664767e6
E0825 11:06:00.055084 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9012/topo/v3/app/search/0] , err: Post "http://192.168.0.103:9012/topo/v3/app/search/0": EOF, rid: 5dc757355c5e44e6a5a527b4e4da0026
E0825 11:06:00.479673 479 service/http.go:82] *failed do request[POST url: http://192.168.0.103:9007/host/v3/findmany/cloudarea] , err: Post "http://192.168.0.103:9007/host/v3/findmany/cloudarea": EOF, rid: 73bc5aa0f92d4d6cabaee0f9a58b5a2c
webserver.ERRER
E0825 11:19:58.629462 407 configcenter/cc.go:147] config center received event that common config has changed, but got err: zk: connection closed
E0825 11:19:58.629496 407 configcenter/cc.go:257] config center received event that *LANGUAGE* config has changed, but got err: zk: connection closed
E0825 11:19:58.629531 407 configcenter/cc.go:235] config center received event that *ERROR CODE* config has changed, but got err: zk: connection closed
E0825 11:19:58.629598 407 configcenter/cc.go:194] config center received event that common config has changed, but got err: zk: connection closed
E0825 11:19:58.629630 407 configcenter/cc.go:214] config center received event that common config has changed, but got err: zk: connection closed
E0825 11:19:58.629651 407 backbone/notice.go:57] log watch failed, will watch after 10s, path: /cc/services/notice/log/192.168.0.103:9013/v, err: zk: connection closed
E0825 11:19:58.629709 407 configcenter/cc.go:169] config center received event that extra config has changed, but got err: zk: connection closed
E0825 11:19:58.667973 407 backbone/notice.go:60] fail to watch register node(/cc/services/notice/log/192.168.0.103:9013/v), reason: connect closed. retry connect err:zk: could not connect to a server
3.9的配置文件是yaml格式的,是一样的。看着是zk存在问题,重新连接下看看
3.9的配置文件是yaml格式的,是一样的。看着是zk存在问题,重新连接下看看
重连无法解决
看着是datacollection没有正常启动导致,可以看下datacollection错误日志
看着是datacollection没有正常启动导致,可以看下datacollection错误日志 datacollection.ERROR:
Log file created at: 2022/08/30 13:21:41
Running on machine: blueking003
Binary: Built with gc go1.19 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
看着是datacollection没有正常启动导致,可以看下datacollection错误日志 datacollection.ERROR:
Log file created at: 2022/08/30 13:21:41 Running on machine: blueking003 Binary: Built with gc go1.19 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
最上面那个错误日志在10:57 apiserver和webserver的healthz是因为没有服务发现到datacollection,这个是13:21:41之后的datacollection,两个时间点对不上,现在是什么错误呢
+1,采用 3.9.42 错误日志相同
+1,采用 3.9.42 错误日志相同
是什么样的错误日志内容呢,可以提供下