support discrete shards and/or named shards

Open DeoLeung opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. we are using schema base sharding(each tenant rely in one schema), and we name it something like dm_1, dm_3 with discrete values(historical reason...),

it will be best if there's a sharding method called 'manual', and supporting identifier rather than shard number

Describe the solution you'd like we want the config to support something like

#
# Shards in the cluster
[shards]

# Shard 0
[shards.1]

# [ host, port, role ]
servers = [
    [ "pg1", 5432, "primary" ],
    #[ "localhost", 5433, "replica" ],
    # [ "127.0.1.1", 5432, "replica" ],
]
# Database name (e.g. "postgres")
database = "postgres"
search_path = "\"$user\",dm_1,public"

[shards.3]
# [ host, port, role ]
servers = [
    [ "pg2", 5432, "primary" ],
    # [ "localhost", 5433, "replica" ],
    # [ "127.0.1.1", 5432, "replica" ],
]
database = "postgres"
search_path = "\"$user\",dm_3,public"

Describe alternatives you've considered if it's technical difficult, it's ok to stick to 0,1...n, just not that intuitive to users targeting a certain schema(and relating database)

Additional context currently it reports error since it counts the shards

postgres@cb1fed2d6844:/$ psql -h pgcat -p 6432
Password for user postgres:
psql (13.7 (Debian 13.7-1.pgdg110+1))
Type "help" for help.

postgres=#
postgres=#
postgres=# select * from test
postgres-# ;
 id
----
  1
  3
(2 rows)

postgres=# set shard to '2';
SET SHARD
postgres=# select * from test;
FATAL:  shard 2 is more than configured 2
postgres=# set shard to '3';
SET SHARD
postgres=# select * from test;
FATAL:  shard 3 is more than configured 2

Jun 22 '22 07:06 DeoLeung

I think you're describing a multi-tenant setup, not quite a sharding setup in it's intended use case. Sharding tends to split a single database into multiple databases that contain parts of the whole dataset and all the tenants are interested in querying all of the dataset; sharding effectively just decreases the load on the hardware and allows to support more queries. In your case I believe you'd be better served by configuring multiple databases, e.g.

psql -d tenant1 -c 'SELECT * FROM data LIMIT 1'
psql -d tenant2 -c 'SELECT * FROM data LIMIT 1'

where tenant1 and tenant2 are different databases and consequently tenants in your system.

PgCat doesn't support multiple databases yet, but that support is on the roadmap.

Jun 22 '22 15:06 levkk