Idea: Shard by geolocation
Goal
Being able to store all records related to users in one geolocation only on workers in that geolocation.
Options
Create a distributed table for each geolocation
Need some way to wrap encapsulate the tables
Distribute a table by multiple columns
Elegant, but requires a lot of changes:
- the planner needs to detect a join on two columns
- range (geo_location) + hash (tenant) distribution would be nice
Need some way to wrap encapsulate the tables
One approach would be to have hierarchical distributed tables. The top-level table is range-distributed by geolocation and its shards are on the coordinator itself. Each shard is a another distributed table in a separate co-location group.
The poor man's version of this would be:
CREATE TABLE global_events (location text, tenant_id bigint, event_payload jsonb, event_id bigserial, event_time timestamptz);
CREATE TABLE global_events_1 (location text, tenant_id bigint, event_payload jsonb, event_id bigserial, event_time timestamptz);
CREATE TABLE global_events_2 (location text, tenant_id bigint, event_payload jsonb, event_id bigserial, event_time timestamptz);
SELECT create_distributed_table('global_events_1', 'tenant_id', colocate_with := 'none');
SELECT create_distributed_table('global_events_2', 'tenant_id', colocate_with := 'none');
SELECT create_distributed_table('global_events', 'location', 'range');
-- make sure the coordinator is added to itself
SELECT * FROM master_add_node('localhost', 5432);
-- set up the metadata for the shards
INSERT INTO pg_dist_shard VALUES ('global_events', 1, 'r', 'EU', 'EU');
INSERT INTO pg_dist_shard VALUES ('global_events', 2, 'r', 'US', 'US');
INSERT INTO pg_dist_shard_placement VALUES (1, 1, 0, 'localhost', 5432);
INSERT INTO pg_dist_shard_placement VALUES (2, 1, 0, 'localhost', 5432);
-- do a globally distributed transaction
BEGIN;
INSERT INTO global_events VALUES ('EU', 1, '{"hello":"world"}');
INSERT INTO global_events VALUES ('US', 1, '{"hello":"world"}');
SELECT * FROM global_events;
END;
Not all operations are supported:
TRUNCATE global_events;
ERROR: canceling the transaction since it was involved in a distributed deadlock
CONTEXT: while executing command on localhost:5432
ALTER TABLE global_events ADD COLUMN newcol int;
ERROR: relation "global_events" does not exist
CONTEXT: while executing command on localhost:5432
DELETE FROM global_events;
WARNING: cannot use 2PC in transactions involving multiple servers
WARNING: connection error: localhost:5432
DETAIL: another command is already in progress
WARNING: failed to roll back prepared transaction 'citus_0_39580_77_1'
HINT: Run "ROLLBACK PREPARED 'citus_0_39580_77_1'" on localhost:5432
ERROR: cannot use 2PC in transactions involving multiple servers
CONTEXT: while executing command on localhost:5432
Once we implement #2830 we should be able to handle most operations (except COPY) in the coordinator backend, which would resolve a lot of these issues.
We should also implement #2512 to allow the coordinator to be in pg_dist_node without placing shards there.