nacos icon indicating copy to clipboard operation
nacos copied to clipboard

docker运行nacos集群出现异常日志

Open zhu121 opened this issue 2 years ago • 4 comments

Version

  • os: CentOS Linux release 7.9.2009 (Core)
  • docker: Docker version 20.10.7, build f0df350
  • nacos: nacos/nacos-server:v2.1.0

Operation

  • firewall-cmd --zone=public --add-port=xxx/tcp --permanent 开放了8148/8248/8348和9148/9248/9348
  • 在同一台宿主机上执行docker compose -f docker-compose-nacos.yaml --env-file nacos.env up -d
  • 查看nacos/logs目录下的日志,发现挺多异常信息
  • docker logs xxx查看启动结果,显示:Nacos started successfully in cluster mode. use external storage

Code

version: "3"
services:
  nacos1:
    image: nacos/nacos-server:${NACOS_VERSION}
    hostname: nacos1
    container_name: nacos1
    environment:
      - MODE=cluster
      - PREFER_HOST_MODE=hostname
      - NACOS_SERVERS=nacos1:8848 nacos2:8848 nacos3:8848
      - SPRING_DATASOURCE_PLATFORM:mysql
      - MYSQL_SERVICE_HOST=${MYSQL_SERVICE_HOST}
      - MYSQL_SERVICE_PORT=${MYSQL_SERVICE_PORT}
      - MYSQL_SERVICE_USER=${MYSQL_SERVICE_USER}
      - MYSQL_SERVICE_PASSWORD=${MYSQL_SERVICE_PWD}
      - MYSQL_SERVICE_DB_NAME=${MYSQL_SERVICE_DB}
      - JVM_XMS=128m
      - JVM_XMX=128m
      - JVM_XMN=128m
    volumes:
      - ${NACOS_HOME}/nacos1/logs:/home/nacos/logs
      - ${NACOS_HOME}/nacos1/init.d:/home/nacos/init.d
    ports:
      - "8148:8848"
      - "9148:9848"
    privileged: true
    restart: on-failure
  nacos2:
    image: nacos/nacos-server:${NACOS_VERSION}
    hostname: nacos2
    container_name: nacos2
    environment:
      - MODE=cluster
      - PREFER_HOST_MODE=hostname
      - NACOS_SERVERS=nacos1:8848 nacos2:8848 nacos3:8848
      - SPRING_DATASOURCE_PLATFORM:mysql
      - MYSQL_SERVICE_HOST=${MYSQL_SERVICE_HOST}
      - MYSQL_SERVICE_PORT=${MYSQL_SERVICE_PORT}
      - MYSQL_SERVICE_USER=${MYSQL_SERVICE_USER}
      - MYSQL_SERVICE_PASSWORD=${MYSQL_SERVICE_PWD}
      - MYSQL_SERVICE_DB_NAME=${MYSQL_SERVICE_DB}
      - JVM_XMS=128m
      - JVM_XMX=128m
      - JVM_XMN=128m
    volumes:
      - ${NACOS_HOME}/nacos2/logs:/home/nacos/logs
      - ${NACOS_HOME}/nacos2/init.d:/home/nacos/init.d
    ports:
      - "8248:8848"
      - "9248:9848"
    privileged: true
    restart: on-failure
  nacos3:
    image: nacos/nacos-server:${NACOS_VERSION}
    hostname: nacos3
    container_name: nacos3
    environment:
      - MODE=cluster
      - PREFER_HOST_MODE=hostname
      - NACOS_SERVERS=nacos1:8848 nacos2:8848 nacos3:8848
      - SPRING_DATASOURCE_PLATFORM:mysql
      - MYSQL_SERVICE_HOST=${MYSQL_SERVICE_HOST}
      - MYSQL_SERVICE_PORT=${MYSQL_SERVICE_PORT}
      - MYSQL_SERVICE_USER=${MYSQL_SERVICE_USER}
      - MYSQL_SERVICE_PASSWORD=${MYSQL_SERVICE_PWD}
      - MYSQL_SERVICE_DB_NAME=${MYSQL_SERVICE_DB}
      - JVM_XMS=128m
      - JVM_XMX=128m
      - JVM_XMN=128m
    volumes:
      - ${NACOS_HOME}/nacos3/logs:/home/nacos/logs
      - ${NACOS_HOME}/nacos3/init.d:/home/nacos/init.d
    ports:
      - "8348:8848"
      - "9348:9848"
    privileged: true
    restart: on-failure

Exception ./naming-server.log 2022-08-10 14:44:36,216 WARN Exception while request: http://nacos2:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos2:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:38,963 WARN Exception while request: http://nacos3:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos3:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:40,927 WARN Exception while request: http://nacos2:8848/nacos/v1/ns/distro/datums, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) 2022-08-10 14:44:40,928 WARN Exception while request: http://nacos3:8848/nacos/v1/ns/distro/datums, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) 2022-08-10 14:44:40,967 WARN Exception while request: http://nacos2:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos2:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:44,962 WARN Exception while request: http://nacos3:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos3:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:46,980 WARN Exception while request: http://nacos2:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos2:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:51,139 WARN Exception while request: http://nacos3:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos3:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused 2022-08-10 14:44:52,961 WARN Exception while request: http://nacos2:8848/nacos/v1/ns/operator/cluster/state, caused: {} org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) java.io.IOException: failed to req API:http://nacos2:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) java.net.ConnectException: Connection refused java.io.IOException: failed to req API:http://nacos3:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: caused: unable to find local peer: nacos3:8848, all peers: []; java.io.IOException: failed to req API:http://nacos2:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: caused: unable to find local peer: nacos2:8848, all peers: [];

./nacos.log io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA frame for an unknown stream 3 at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:147) 2022-08-10 14:44:50,008 INFO Creating filter chain: any request, [org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter@f096f37, org.springframework.security.web.context.SecurityContextPersistenceFilter@3d6a6bee, org.springframework.security.web.header.HeaderWriterFilter@30e6a763, org.springframework.security.web.csrf.CsrfFilter@fca387, org.springframework.security.web.authentication.logout.LogoutFilter@4b2e3e8f, org.springframework.security.web.savedrequest.RequestCacheAwareFilter@213c3543, org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter@3cff0139, org.springframework.security.web.authentication.AnonymousAuthenticationFilter@3effd4f3, org.springframework.security.web.session.SessionManagementFilter@732c9b5c, org.springframework.security.web.access.ExceptionTranslationFilter@3ae0b770] java.lang.IllegalStateException: unable to find local peer: nacos1:8848, all peers: [] java.lang.IllegalStateException: unable to find local peer: nacos1:8848, all peers: []

./protocol-raft.log java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service_v2, Fail to find node nacos3:7848 in group naming_persistent_service_v2, Unknown leader, Fail to find node nacos2:7848 in group naming_persistent_service_v2 java.lang.IllegalStateException: Fail to get leader of group naming_instance_metadata, Fail to find node nacos3:7848 in group naming_instance_metadata, Unknown leader, Fail to find node nacos2:7848 in group naming_instance_metadata java.lang.IllegalStateException: Fail to get leader of group naming_service_metadata, Fail to find node nacos3:7848 in group naming_service_metadata, Unknown leader, Fail to find node nacos2:7848 in group naming_service_metadata java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service_v2, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_service_metadata, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service_v2, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_instance_metadata, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_service_metadata, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_instance_metadata, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_persistent_service_v2, Unknown leader, Unknown leader, Unknown leader java.lang.IllegalStateException: Fail to get leader of group naming_instance_metadata, Unknown leader, Unknown leader, Unknown leader

./protocol-distro.log com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION]Get snapshot from nacos2:8848 failed. Caused by: java.io.IOException: failed to req API: http://nacos2:8848/nacos/v1/ns/distro/datums. code: 500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos2:8848 [nacos2/172.18.0.3] failed: Connection refused (Connection refused) com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION]Get snapshot from nacos3:8848 failed. Caused by: java.io.IOException: failed to req API: http://nacos3:8848/nacos/v1/ns/distro/datums. code: 500 msg: org.apache.http.conn.HttpHostConnectException: Connect to nacos3:8848 [nacos3/172.18.0.4] failed: Connection refused (Connection refused) com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION][DISTRO-FAILED] Get distro snapshot failed! Caused by: com.alibaba.nacos.api.exception.NacosException: No rpc client related to member: Member{ip='nacos2', port=8848, state=UP, extendInfo={raftPort=7848, readyToUpgrade=true}} com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION][DISTRO-FAILED] Get distro snapshot failed! Caused by: com.alibaba.nacos.api.exception.NacosException: No rpc client related to member: Member{ip='nacos3', port=8848, state=UP, extendInfo={raftPort=7848, readyToUpgrade=true}}

./alipay-jraft.log 2022-08-10 14:44:31,313 ERROR Fail to connect nacos3:7848, remoting exception: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 0.444879771s. [buffered_nanos=595476345, waiting_for_connection]. 2022-08-10 14:44:32,583 ERROR Fail to connect nacos1:7848, remoting exception: java.util.concurrent.TimeoutException. io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524) 2022-08-10 14:44:33,590 ERROR Fail to connect nacos2:7848, remoting exception: java.util.concurrent.TimeoutException. 2022-08-10 14:44:46,979 ERROR Fail to connect nacos3:7848, remoting exception: java.util.concurrent.TimeoutException. 2022-08-10 14:44:47,011 ERROR Fail to connect nacos3:7848, remoting exception: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 0.999782639s. [buffered_nanos=740087279, remote_addr=nacos3/172.18.0.4:7848]. io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524)

zhu121 avatar Aug 10 '22 07:08 zhu121

网络问题,容器内的8848端口暴露出来的8148/8248/8348;要么让这几个容器共享同一个网络,要么通过暴露出来的外部端口访问。

YunWZ avatar Aug 10 '22 09:08 YunWZ

网络问题,容器内的8848端口暴露出来的8148/8248/8348;要么让这几个容器共享同一个网络,要么通过暴露出来的外部端口访问。

  • 是在同一台CentOS上通过docker运行3个nacos节点的,所以用8148/8248/8348对应容器内的8848,这点应该是没问题吧
  • “通过暴露出来的外部端口访问”,查看日志报错前还没访问nacos页面,后面用宿主机IP+8148/8248/8348分别去访问页面是正常的

zhu121 avatar Aug 10 '22 09:08 zhu121

网络问题,容器内的8848端口暴露出来的8148/8248/8348;要么让这几个容器共享同一个网络,要么通过暴露出来的外部端口访问。

  • 是在同一台CentOS上通过docker运行3个nacos节点的,所以用8148/8248/8348对应容器内的8848,这点应该是没问题吧
  • “通过暴露出来的外部端口访问”,查看日志报错前还没访问nacos页面,后面用宿主机IP+8148/8248/8348分别去访问页面是正常的

控制台上面的集群信息包含这三台服务器么?如果包含的话,那是正常的,因为你的三个节点是同时启动的,在这种情况下,后两个节点还没就绪,第一个节点就开始尝试连接nacos2,nacos3节点,这时候肯定是会有错误日志的。

YunWZ avatar Aug 10 '22 09:08 YunWZ

网络问题,容器内的8848端口暴露出来的8148/8248/8348;要么让这几个容器共享同一个网络,要么通过暴露出来的外部端口访问。

  • 是在同一台CentOS上通过docker运行3个nacos节点的,所以用8148/8248/8348对应容器内的8848,这点应该是没问题吧
  • “通过暴露出来的外部端口访问”,查看日志报错前还没访问nacos页面,后面用宿主机IP+8148/8248/8348分别去访问页面是正常的

控制台上面的集群信息包含这三台服务器么?如果包含的话,那是正常的,因为你的三个节点是同时启动的,在这种情况下,后两个节点还没就绪,第一个节点就开始尝试连接nacos2,nacos3节点,这时候肯定是会有错误日志的。

  • 在页面上查看节点列表,是nacos1:8848、nacos2:8848、nacos3:8848
  • 查看节点元数据,比较后都是
{
    "lastRefreshTime": 1660118412368,
    "raftMetaData": {
        "metaDataMap": {
            "naming_instance_metadata": {
                "leader": "nacos2:7848",
                "raftGroupMember": [
                    "nacos3:7848",
                    "nacos1:7848",
                    "nacos2:7848"
                ],
                "term": 2
            },
            "naming_persistent_service": {
                "leader": "nacos3:7848",
                "raftGroupMember": [
                    "nacos3:7848",
                    "nacos1:7848",
                    "nacos2:7848"
                ],
                "term": 2
            },
            "naming_persistent_service_v2": {
                "leader": "nacos3:7848",
                "raftGroupMember": [
                    "nacos3:7848",
                    "nacos1:7848",
                    "nacos2:7848"
                ],
                "term": 2
            },
            "naming_service_metadata": {
                "leader": "nacos3:7848",
                "raftGroupMember": [
                    "nacos3:7848",
                    "nacos1:7848",
                    "nacos2:7848"
                ],
                "term": 2
            }
        }
    },
    "raftPort": "7848",
    "readyToUpgrade": true,
    "version": "2.1.0"
}

zhu121 avatar Aug 10 '22 09:08 zhu121

Connection refused (Connection refused)

这个问题就是端口没监听对,环境问题,需要自行排查下。

KomachiSion avatar Aug 11 '22 06:08 KomachiSion

Connection refused (Connection refused)

这个问题就是端口没监听对,环境问题,需要自行排查下。

我的.yaml文件(Code)是照着官方的cluster-hostname.yaml改的,除了mysql,其他基本一致,环境的操作是Operation,这边是缺少了什么步骤吗

zhu121 avatar Aug 11 '22 07:08 zhu121

从报错看就是从nacos1节点访问nacos2和nacos3节点的时候报错8848端口拒绝连接,要么就是节点启动,没监听对应端口,要么就是docker的配置文件有问题,访问对应端口失败了。

完全按照nacos-docker的操作步骤跑一下试试。

KomachiSion avatar Aug 12 '22 09:08 KomachiSion

No more response from author, I think this is a env problem.

KomachiSion avatar Aug 22 '22 02:08 KomachiSion