gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Initialise GPDB cluster using `gp` utility

Open jnihal opened this issue 1 year ago • 2 comments

Following gp services framework PR, here's the functionality to initialize the cluster using the underlying framework. Current PR covers the following:

  • Cluster initialization with the provided configuration
  • Unit tests for most of the code-path
  • Support for JSON, YAML, and TOML config files

High-level flow: Intialize Cluster

  1. Invoke CLI: gp init cluster <config-file>
  2. Config Validation: Validates input configuration
  3. Request to hub: RPC on Hub to initialize the cluster
  4. Hub side validation: Validate the host environment on each host
  5. Create Coordinator: Create the coordinator and register primaries
  6. Create Primaries
  7. Start Cluster

The program flow:

  • cli/init.go: initCmd() Starting point for init command command gp init cluster. Performs basic flag checks, and config file validity checks. Forms cluster creation request and calls RPC service on Hub to create the cluster.
  • hub/make_cluster.go: MakeCluster() Starting point for MakeCluster RPC to create the actual cluster
  • MakeCluster calls for validating the host environment where multiple checks are done returning error/warning
  • MakeCluster creates a coordinator segment followed by the start segment
  • Register primaries to the coordinator and get content-id and dbid for the segments.
  • Post registration, MakeCluster creates primary segments in parallel on all hosts
  • Stop the coordinator to restart the cluster.
  • Start the cluster using gpstart
  • Create GPDB extensions
  • Import collations
  • Setup GPDB superuser password

List of RPCs added: Hub

  • hub/make_cluster.go: MakeCluster: Initializes GPDB cluster by accepting the request to initialize a cluster. Makes call to validate host environment on each host, create coordinator followed by primaries.

Agent

  • agent/validate_host_env.go: ​​ValidateHostEnv: Validates host environment for various checks on the local host. Returns errors/warnings if any.
  • agent/make_segment.go: MakeSegment: Create and configure a new segment. Internally uses initdb to create postgres instance and updates postgresql.conf and pg_hba.conf -agent/start_segment.go: StartSegment: Starts a segment using pg_ctl start command

Example

Setup Pre-requisites: Ensure that the gp services are configured and started before the cluster initialization begins. Steps to set up GP services is as follows:

make; make install
gp configure <configure options like certs, host-list etc>
gp start services

Confirm that GP services are running. Check the status of the GP services:

gp status services

All the services must be running. All the hosts part of the GPDB cluster have a running agent service.

Initialising cluster:

gp init cluster cluster-config.json

gp init cluster <input-config-file> [--force]

input-config-file: JSON/YAML/TOML file containing the cluster configuration.
--force flag: override the previous installation by deleting existing data directories.

Sample input configuration file (Please replace the hostname with the actual hostname):

{
   "encoding":"Unicode",
   "hba-hostnames":true,
   "su-password":"gparray",
   "data-checksums":true,
   "locale":{
  	"lc-all":"en_US.UTF-8",
  	"lc-collate":"en_US.UTF-8",
  	"lc-ctype":"en_US.UTF-8",
  	"lc-messages":"en_US.UTF-8",
  	"lc-monetary":"en_US.UTF-8",
  	"lc-numeric":"en_US.UTF-8",
  	"lc-time":"en_US.UTF-8"
   },
   "common-config":{
  	"shared_buffers":"128000kB"
   },
   "coordinator-config":{
  	"max_connections":50
   },
   "segment-config":{
  	"max_connections":150,
  	"debug_pretty_print":"off",
  	"log_min_messages":"warning"
   },
   "coordinator":{
  	"hostname":"host-1",
  	"address":"host-1",
  	"port":7000,
  	"data-directory":"/tmp/demo/0"
   },
   "primary-segments-array":[
  	{
     	"hostname":"host-1",
     	"address":"host-1",
     	"port":7002,
     	"data-directory":"/tmp/demo/1"
  	},
  	{
     	"hostname":"host-1",
     	"address":"host-1",
     	"port":7003,
     	"data-directory":"/tmp/demo/2"
  	},
  	{
     	"hostname":"host-1",
     	"address":"host-1",
     	"port":7004,
     	"data-directory":"/tmp/demo/3"
  	},
  	{
     	"hostname":"host-1",
     	"address":"host-1",
     	"port":7005,
     	"data-directory":"/tmp/demo/4"
  	}
   ]
}

Testing

  • Creating multi segment cluster on localhost
  • Creating multi host multi segment setup
  • Support of YAML, TOML, JSON files

Performance comparison:

Index Setup Legacy gpinitsystem Proposed gp cluster utility
1 4 hosts, 60 segments 1m41.346s 0m20.462s
2 4 hosts, 120 segments 3m12.888s 0m33.967s

Unit Test Coverage:

github.com/greenplum-db/gpdb/gp/agent: 89.4%
github.com/greenplum-db/gpdb/gp/cli: 53.2%
github.com/greenplum-db/gpdb/gp/hub: 69.6%
github.com/greenplum-db/gpdb/gp/utils: 91.9%
github.com/greenplum-db/gpdb/gp/utils/greenplum: 98.3%
github.com/greenplum-db/gpdb/gp/utils/postgres: 99.3%
Total: 76.42%

Following things will be covered in the future PR -

  • Support for mirrors
  • Support for expandable config file
  • Functional tests for gp init command
  • Generate sample config file using the -o option

Here are some reminders before you submit the pull request

  • [ ] Add tests for the change
  • [ ] Document changes
  • [ ] Communicate in the mailing list if needed
  • [ ] Pass make installcheck
  • [ ] Review a PR in return to support the community

jnihal avatar Dec 22 '23 10:12 jnihal