gpdb
gpdb copied to clipboard
Initialise GPDB cluster using `gp` utility
Following gp services framework PR, here's the functionality to initialize the cluster using the underlying framework. Current PR covers the following:
- Cluster initialization with the provided configuration
- Unit tests for most of the code-path
- Support for JSON, YAML, and TOML config files
High-level flow:
- Invoke CLI:
gp init cluster <config-file> - Config Validation: Validates input configuration
- Request to hub: RPC on Hub to initialize the cluster
- Hub side validation: Validate the host environment on each host
- Create Coordinator: Create the coordinator and register primaries
- Create Primaries
- Start Cluster
The program flow:
cli/init.go:initCmd()Starting point for init command commandgp init cluster. Performs basic flag checks, and config file validity checks. Forms cluster creation request and calls RPC service on Hub to create the cluster.hub/make_cluster.go:MakeCluster()Starting point forMakeClusterRPC to create the actual clusterMakeClustercalls for validating the host environment where multiple checks are done returning error/warningMakeClustercreates a coordinator segment followed by the start segment- Register primaries to the coordinator and get content-id and dbid for the segments.
- Post registration, MakeCluster creates primary segments in parallel on all hosts
- Stop the coordinator to restart the cluster.
- Start the cluster using
gpstart - Create GPDB extensions
- Import collations
- Setup GPDB superuser password
List of RPCs added: Hub
hub/make_cluster.go:MakeCluster: Initializes GPDB cluster by accepting the request to initialize a cluster. Makes call to validate host environment on each host, create coordinator followed by primaries.
Agent
agent/validate_host_env.go:ValidateHostEnv: Validates host environment for various checks on the local host. Returns errors/warnings if any.agent/make_segment.go:MakeSegment: Create and configure a new segment. Internally usesinitdbto create postgres instance and updatespostgresql.confandpg_hba.conf-agent/start_segment.go:StartSegment: Starts a segment usingpg_ctl startcommand
Example
Setup Pre-requisites: Ensure that the gp services are configured and started before the cluster initialization begins. Steps to set up GP services is as follows:
make; make install
gp configure <configure options like certs, host-list etc>
gp start services
Confirm that GP services are running. Check the status of the GP services:
gp status services
All the services must be running. All the hosts part of the GPDB cluster have a running agent service.
Initialising cluster:
gp init cluster cluster-config.json
gp init cluster <input-config-file> [--force]
input-config-file: JSON/YAML/TOML file containing the cluster configuration.
--force flag: override the previous installation by deleting existing data directories.
Sample input configuration file (Please replace the hostname with the actual hostname):
{
"encoding":"Unicode",
"hba-hostnames":true,
"su-password":"gparray",
"data-checksums":true,
"locale":{
"lc-all":"en_US.UTF-8",
"lc-collate":"en_US.UTF-8",
"lc-ctype":"en_US.UTF-8",
"lc-messages":"en_US.UTF-8",
"lc-monetary":"en_US.UTF-8",
"lc-numeric":"en_US.UTF-8",
"lc-time":"en_US.UTF-8"
},
"common-config":{
"shared_buffers":"128000kB"
},
"coordinator-config":{
"max_connections":50
},
"segment-config":{
"max_connections":150,
"debug_pretty_print":"off",
"log_min_messages":"warning"
},
"coordinator":{
"hostname":"host-1",
"address":"host-1",
"port":7000,
"data-directory":"/tmp/demo/0"
},
"primary-segments-array":[
{
"hostname":"host-1",
"address":"host-1",
"port":7002,
"data-directory":"/tmp/demo/1"
},
{
"hostname":"host-1",
"address":"host-1",
"port":7003,
"data-directory":"/tmp/demo/2"
},
{
"hostname":"host-1",
"address":"host-1",
"port":7004,
"data-directory":"/tmp/demo/3"
},
{
"hostname":"host-1",
"address":"host-1",
"port":7005,
"data-directory":"/tmp/demo/4"
}
]
}
Testing
- Creating multi segment cluster on localhost
- Creating multi host multi segment setup
- Support of YAML, TOML, JSON files
Performance comparison:
| Index | Setup | Legacy gpinitsystem | Proposed gp cluster utility |
|---|---|---|---|
| 1 | 4 hosts, 60 segments | 1m41.346s | 0m20.462s |
| 2 | 4 hosts, 120 segments | 3m12.888s | 0m33.967s |
Unit Test Coverage:
github.com/greenplum-db/gpdb/gp/agent: 89.4%
github.com/greenplum-db/gpdb/gp/cli: 53.2%
github.com/greenplum-db/gpdb/gp/hub: 69.6%
github.com/greenplum-db/gpdb/gp/utils: 91.9%
github.com/greenplum-db/gpdb/gp/utils/greenplum: 98.3%
github.com/greenplum-db/gpdb/gp/utils/postgres: 99.3%
Total: 76.42%
Following things will be covered in the future PR -
- Support for mirrors
- Support for expandable config file
- Functional tests for
gp initcommand - Generate sample config file using the
-ooption
Here are some reminders before you submit the pull request
- [ ] Add tests for the change
- [ ] Document changes
- [ ] Communicate in the mailing list if needed
- [ ] Pass
make installcheck - [ ] Review a PR in return to support the community