incubator-hugegraph icon indicating copy to clipboard operation
incubator-hugegraph copied to clipboard

[Question] 集群搭建时 pd,store以及server 节点是否需要一一对应 ? 3pd-6store-3server是否有影响

Open Tommu10ve opened this issue 6 months ago • 10 comments

Problem Type (问题类型)

None

Before submit

  • [x] 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.5.0 (Apache Release Version)
  • Backend: RocksDB x nodes, HDD or SSD

Your Question (问题描述)

集群搭建时 pd,store以及server 节点是否需要一一对应 ? 3pd-6store-3server是否有影响


version: "3"

services:
  pd0:
    image: hugegraph/pd
    container_name: pd0
    hostname: pd0
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8620"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd0.yml:/hugegraph-pd/conf/application.yml

  pd1:
    image: hugegraph/pd
    container_name: pd1
    hostname: pd1
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8621"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd1.yml:/hugegraph-pd/conf/application.yml

  pd2:
    image: hugegraph/pd
    container_name: pd2
    hostname: pd2
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8622"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-pd2.yml:/hugegraph-pd/conf/application.yml

  store0:
    image: hugegraph/store
    container_name: store0
    hostname: store0
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8520"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store0.yml:/hugegraph-store/conf/application.yml

  store1:
    image: hugegraph/store
    container_name: store1
    hostname: store1
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8521"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store1.yml:/hugegraph-store/conf/application.yml

  store2:
    image: hugegraph/store
    container_name: store2
    hostname: store2
    network_mode: host
    depends_on:
      pd0:
        condition: service_healthy
      pd1:
        condition: service_healthy
      pd2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8522"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/application-store2.yml:/hugegraph-store/conf/application.yml

  server1:
    image: hugegraph/server
    container_name: server1
    hostname: server1
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8081"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server1-conf:/hugegraph-server/conf

  server2:
    image: hugegraph/server
    container_name: server2
    hostname: server2
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8082"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server2-conf:/hugegraph-server/conf

  server3:
    image: hugegraph/server
    container_name: server3
    hostname: server3
    network_mode: host
    depends_on:
      store0:
        condition: service_healthy
      store1:
        condition: service_healthy
      store2:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8083"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - ./configs/server3-conf:/hugegraph-server/conf

Vertex/Edge example (问题点 / 边数据举例)


Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)


Tommu10ve avatar Jun 19 '25 08:06 Tommu10ve

不需要一一对应,pd、store 和 server 节点数量可以独立扩展。3pd-6store-3server 这种配置是支持的,也是分布式部署中常见的做法。PD 节点负责集群管理和元数据,Store 节点负责数据存储,可以根据容量和性能需求增加,Server 节点负责对外提供 API 服务,也可以根据访问压力扩展。没有 1:1:1 的架构要求,实际部署时只需保证 raft、store、pd 等配置项在各节点间保持一致,网络互通即可。详细配置和常见问题可参考相关讨论部署经验

建议关注 raft 相关配置、分片数量、监控和日志,确保各节点配置一致、端口不冲突、数据目录不共用,避免常见的分布式部署问题。如果有更具体的性能或容量需求,可以进一步调整 store 或 server 数量来优化。

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 19 '25 08:06 dosubot[bot]

server 节点 之间通讯原理是啥? server 节点 配置中没有像 pd 中 raft.peers-list 集群中所有 PD 节点的 raft 地址 这种类似配置 ?

Tommu10ve avatar Jun 19 '25 09:06 Tommu10ve

server 节点之间主要通过 HTTP 和 RPC 进行通讯,负责对外提供 API 服务和任务调度。它们本身是无状态的 API 层,不直接参与底层数据一致性或元数据管理,所以配置中不需要像 PD 节点那样维护 raft.peers-list 这类集群成员列表。Raft 共识和集群成员发现只在 PD(负责元数据和调度)和 Store(负责数据存储)层实现,server 节点只需知道如何访问 store/pd 服务即可,无需和其他 server 节点建立 raft 组或同步状态。

具体来说,server 节点通过 REST_SERVER_URL 配置监听 HTTP 服务,内部还会启动 RPC 服务用于分布式任务、权限等功能,但这些都是通过服务注册和 RPC 地址发现实现的,不涉及 raft 共识组配置。你可以在 RestServer 和 GraphManager 相关代码中看到 HTTP 和 RPC 的实现细节,server 节点的配置和启动流程也印证了这一点。相关代码参考 RPC 机制参考

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 19 '25 09:06 dosubot[bot]

6server 都开启 权限,怎么保障它们使用的是同一份鉴权数据

Tommu10ve avatar Jun 19 '25 09:06 Tommu10ve

raft、store、pd 等配置作用分别是什么?

Tommu10ve avatar Jun 19 '25 09:06 Tommu10ve

三者配置作用如下:

  • raft 配置(如 address、peers-list、group_peers)用于设置 Raft 共识集群,保证分布式数据和元数据的一致性与高可用,主要出现在 PD 和 Store 节点配置中,要求所有节点保持一致,指定所有参与 Raft 的节点地址,决定了集群的容错和同步机制。配置示例

  • pd 配置(如 patrol-interval、data-path、store-list)用于 PD(Placement Driver)节点,负责集群元数据管理、分区分片、调度和监控。PD 节点是集群的“大脑”,管理 Store 节点注册、分区分配、数据迁移等。配置示例

  • store 配置(如 pdserver.address、raft.address、data-path、keepAlive-timeout)用于 Store 节点,负责实际数据存储、分区副本、与 PD 通信、Raft 日志同步等。Store 节点通过 pdserver.address 连接 PD,参与 Raft 共识,保证数据可靠性和分布式扩展。配置示例

三者协作实现了 HugeGraph 分布式的高可用、可扩展和一致性。Raft 负责一致性,PD 负责调度和元数据,Store 负责数据存储和副本。配置正确与否直接影响集群的稳定性和性能。参考1 参考2

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 19 '25 09:06 dosubot[bot]

6server 都开启 权限,怎么保障它们使用的是同一份鉴权数据

@dosu-bot 回答用户问题

Tommu10ve avatar Jun 19 '25 09:06 Tommu10ve

6server 都开启 权限,怎么保障它们使用的是同一份鉴权数据

在 PD 模式下, 鉴权数据就不会存在 server 那边了 (之前存算一体), 就在PD 这边统一维护, 自然就不会有不一致的问题了

Server 本身无状态不需要做一致性维护基本上

imbajin avatar Jun 19 '25 11:06 imbajin

server 部署多个节点的作用有什么, 一个server 和 多个server 区别在哪?

@dosu-bot 回答用户问题

Tommu10ve avatar Jun 20 '25 07:06 Tommu10ve

部署多个 server 节点的主要作用是提升系统的并发能力、负载均衡和高可用性。多个 server 节点可以分摊 API 请求压力,提高整体吞吐量和响应速度。如果某个 server 节点宕机,其他节点还能继续对外服务,避免单点故障。

单个 server 节点适合小规模、低并发场景,但一旦访问量上升或需要高可用,建议部署多个 server 节点。HugeGraph 的 server 层是无状态的,所有鉴权和图数据都存储在后端(如 RocksDB、HBase),server 节点之间无需同步本地状态,扩容和维护都很简单。只要配置指向同一套后端,server 数量可以灵活调整,满足不同规模和可靠性需求。

相关实现细节可参考 RestServer 和 GraphManager 的源码,server 节点通过 HTTP/RPC 提供服务,支持横向扩展和故障切换,无需像 PD/Store 那样参与 raft 共识或维护 peers-list 配置。代码参考1 代码参考2

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

dosubot[bot] avatar Jun 20 '25 07:06 dosubot[bot]

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

github-actions[bot] avatar Jul 05 '25 21:07 github-actions[bot]