incubator-hugegraph [Question] 集群搭建时 pd,store以及server 节点是否需要一一对应 ? 3pd-6store-3server是否有影响

Problem Type (问题类型)

None

Before submit

[x] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.5.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD

Your Question (问题描述)

集群搭建时 pd,store以及server 节点是否需要一一对应 ? 3pd-6store-3server是否有影响


version: "3"

services:
  pd0:
    image: hugegraph/pd
    container_name: pd0
    hostname: pd0
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8620"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd0.yml:/hugegraph-pd/conf/application.yml

  pd1:
    image: hugegraph/pd
    container_name: pd1
    hostname: pd1
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8621"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd1.yml:/hugegraph-pd/conf/application.yml

  pd2:
    image: hugegraph/pd
    container_name: pd2
    hostname: pd2
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8622"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd2.yml:/hugegraph-pd/conf/application.yml

  store0:
    image: hugegraph/store
    container_name: store0
    hostname: store0
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8520"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store0.yml:/hugegraph-store/conf/application.yml

  store1:
    image: hugegraph/store
    container_name: store1
    hostname: store1
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8521"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store1.yml:/hugegraph-store/conf/application.yml

  store2:
    image: hugegraph/store
    container_name: store2
    hostname: store2
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8522"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store2.yml:/hugegraph-store/conf/application.yml

  server1:
    image: hugegraph/server
    container_name: server1
    hostname: server1
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8081"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server1-conf:/hugegraph-server/conf

  server2:
    image: hugegraph/server
    container_name: server2
    hostname: server2
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8082"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server2-conf:/hugegraph-server/conf

  server3:
    image: hugegraph/server
    container_name: server3
    hostname: server3
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8083"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server3-conf:/hugegraph-server/conf

Vertex/Edge example (问题点 / 边数据举例)

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

Jun 19 '25 08:06 Tommu10ve

不需要一一对应，pd、store 和 server 节点数量可以独立扩展。3pd-6store-3server 这种配置是支持的，也是分布式部署中常见的做法。PD 节点负责集群管理和元数据，Store 节点负责数据存储，可以根据容量和性能需求增加，Server 节点负责对外提供 API 服务，也可以根据访问压力扩展。没有 1:1:1 的架构要求，实际部署时只需保证 raft、store、pd 等配置项在各节点间保持一致，网络互通即可。详细配置和常见问题可参考相关讨论和部署经验。

建议关注 raft 相关配置、分片数量、监控和日志，确保各节点配置一致、端口不冲突、数据目录不共用，避免常见的分布式部署问题。如果有更具体的性能或容量需求，可以进一步调整 store 或 server 数量来优化。

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 19 '25 08:06 dosubot[bot]

server 节点之间通讯原理是啥? server 节点配置中没有像 pd 中 raft.peers-list 集群中所有 PD 节点的 raft 地址这种类似配置 ?

Jun 19 '25 09:06 Tommu10ve

server 节点之间主要通过 HTTP 和 RPC 进行通讯，负责对外提供 API 服务和任务调度。它们本身是无状态的 API 层，不直接参与底层数据一致性或元数据管理，所以配置中不需要像 PD 节点那样维护 raft.peers-list 这类集群成员列表。Raft 共识和集群成员发现只在 PD（负责元数据和调度）和 Store（负责数据存储）层实现，server 节点只需知道如何访问 store/pd 服务即可，无需和其他 server 节点建立 raft 组或同步状态。

具体来说，server 节点通过 REST_SERVER_URL 配置监听 HTTP 服务，内部还会启动 RPC 服务用于分布式任务、权限等功能，但这些都是通过服务注册和 RPC 地址发现实现的，不涉及 raft 共识组配置。你可以在 RestServer 和 GraphManager 相关代码中看到 HTTP 和 RPC 的实现细节，server 节点的配置和启动流程也印证了这一点。相关代码参考 RPC 机制参考

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 19 '25 09:06 dosubot[bot]

6server 都开启权限,怎么保障它们使用的是同一份鉴权数据

Jun 19 '25 09:06 Tommu10ve

raft、store、pd 等配置作用分别是什么?

Jun 19 '25 09:06 Tommu10ve

三者配置作用如下：

raft 配置（如 address、peers-list、group_peers）用于设置 Raft 共识集群，保证分布式数据和元数据的一致性与高可用，主要出现在 PD 和 Store 节点配置中，要求所有节点保持一致，指定所有参与 Raft 的节点地址，决定了集群的容错和同步机制。配置示例
pd 配置（如 patrol-interval、data-path、store-list）用于 PD（Placement Driver）节点，负责集群元数据管理、分区分片、调度和监控。PD 节点是集群的“大脑”，管理 Store 节点注册、分区分配、数据迁移等。配置示例
store 配置（如 pdserver.address、raft.address、data-path、keepAlive-timeout）用于 Store 节点，负责实际数据存储、分区副本、与 PD 通信、Raft 日志同步等。Store 节点通过 pdserver.address 连接 PD，参与 Raft 共识，保证数据可靠性和分布式扩展。配置示例

三者协作实现了 HugeGraph 分布式的高可用、可扩展和一致性。Raft 负责一致性，PD 负责调度和元数据，Store 负责数据存储和副本。配置正确与否直接影响集群的稳定性和性能。参考1 参考2

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 19 '25 09:06 dosubot[bot]

6server 都开启权限,怎么保障它们使用的是同一份鉴权数据

@dosu-bot 回答用户问题

Jun 19 '25 09:06 Tommu10ve

6server 都开启权限,怎么保障它们使用的是同一份鉴权数据

在 PD 模式下, 鉴权数据就不会存在 server 那边了 (之前存算一体), 就在PD 这边统一维护, 自然就不会有不一致的问题了

Server 本身无状态不需要做一致性维护基本上

Jun 19 '25 11:06 imbajin

server 部署多个节点的作用有什么, 一个server 和多个server 区别在哪?

@dosu-bot 回答用户问题

Jun 20 '25 07:06 Tommu10ve

部署多个 server 节点的主要作用是提升系统的并发能力、负载均衡和高可用性。多个 server 节点可以分摊 API 请求压力，提高整体吞吐量和响应速度。如果某个 server 节点宕机，其他节点还能继续对外服务，避免单点故障。

单个 server 节点适合小规模、低并发场景，但一旦访问量上升或需要高可用，建议部署多个 server 节点。HugeGraph 的 server 层是无状态的，所有鉴权和图数据都存储在后端（如 RocksDB、HBase），server 节点之间无需同步本地状态，扩容和维护都很简单。只要配置指向同一套后端，server 数量可以灵活调整，满足不同规模和可靠性需求。

相关实现细节可参考 RestServer 和 GraphManager 的源码，server 节点通过 HTTP/RPC 提供服务，支持横向扩展和故障切换，无需像 PD/Store 那样参与 raft 共识或维护 peers-list 配置。代码参考1 代码参考2

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 20 '25 07:06 dosubot[bot]

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

Jul 05 '25 21:07 github-actions[bot]