Improvement ideas / backlog for single shard tables / tenant schema tables
Performance improvements that would be useful for both single-shard tables and tenant-schema tables:
- [ ] Fix prepared statements / plan cache for single-shard tables.
DDL improvements that could provide a better user experience when creating / altering tenant-schema tables:
- The following cannot be used to create a tenant table:
- [ ] CREATE TABLE tenant_546.users OF TYPE ..
- [ ] CREATE SCHEMA tenant_546 CREATE TABLE users ..
SQL / planner improvements that would be useful for both single-shard tables and tenant-schema tables:
- [ ] (High priority) (Medium) Support INSERT with sublinks.
- [ ] (High priority) (Medium) Support UPDATE with volatile functions.
- [ ] "INSERT INTO single_shard_table SELECT .." cannot go through repartitioned insert-select. Tough I'm not sure if this is easily doable because the code-path essentially expect the target table to have shard-key.
- [ ] Planner unnecessarily decides that an outer join of the FORM < recurring_rel LEFT JOIN single_shard_table > would result in recurring tuples but indeed this is not the case for single-shard tables, hence we unnecessarily go through recursive planner for such joins.
- [ ] Support non-router MERGE with single-shard tables with / without distributed tables.
UX Improvements for tenant-schema tables that we might want to do depending on user feedback:
- [ ] Allow having usual distributed / reference tables etc. in tenant schemas, via alter_distributed_table() / create_distributed_table() / create_distributed_table_concurrently() / create_reference_table() / undistribute_table() (somewhat bigger item).
- [ ] Allow colocating tenant schemas.
- [x] Support routing in pgbouncer based on search_path.
- [ ] Enable foreign keys from reference tables to tenant tables without on update/delete cascade (should be fairly easy).
Technical / non-user-facing improvements for tenant-schema tables:
- [ ] Generalize Citus local tables into a single-shard group that’s pinned to the coordinator (somewhat bigger item).
- [ ] Evaluate if it's possible to combine ConvertNewTableIfNecessary() logic with Postprocess_CreateTable.
Operation improvements that are only about the single-shard tables (i.e., those are are not associated with a tenant schema):
- [ ] create_distributed_table_concurrently() doesn't support creating a single shard table.
- [ ] alter_distributed_table() doesn't support altering a single-shard table.
- [ ] alter_distributed_table() doesn't support colocating a random table with a single-shard table.
- [ ] split_shards() could allow splitting shard of a single shard table by accepting a distribution column argument. Alternatively, allowing create_distributed_table_concurrently() to accept a single-shard table would help with the same scenario without requiring a syntax change.
Usability issues observed when doing basic testing with django-tenants:
-
Django sends add/drop constraint commands together with set constraint commands in a single statement, as in:
- SET CONSTRAINTS fkey IMMEDIATE; ALTER TABLE referencing_tbl DROP CONSTRAINT fkey;
- ALTER TABLE referencing_tbl ADD COLUMN id integer DEFAULT 1 NOT NULL CONSTRAINT fkey REFERENCES referenced_tbl(id) DEFERRABLE INITIALLY DEFERRED; SET CONSTRAINTS fkey IMMEDIATE
And this results in following error due to the command that we send to workers: "cannot insert multiple commands into a prepared statement"
To fix those errors, we need to properly deparse the following commands without relying on original DDL stmt:
- [x] ALTER TABLE DROP CONSTRAINT (Fixed by https://github.com/citusdata/citus/pull/7012)
- [x] ALTER TABLE ADD COLUMN CONSTRAINT (Fixing this by https://github.com/citusdata/citus/pull/7032)
-
By default, django (and I believe some other frameworks) add an an "int" based "id" as generated identity to the model tables.
With https://github.com/citusdata/citus/pull/7008, we allowed using such generated identity columns in distributed tables to avoid breaking django migrations. However, any nextval() call made for the underlying sequence of such a column results in following error on workers: "nextval: reached maximum value of sequence"
Altering such a column to a bigint based one later on is not possible too: "cannot execute ALTER COLUMN command involving identity column".
Plus, undistributing a table that uses an identity column is not allowed too, which breaks some table-type-conversion operations such as creating a reference table from a Citus local table. This becomes an important problem, e.g., when creating reference tables for the shared data stored in public schema. Shared tables might have foreign keys to each other --as it's the case for some built-in django applications-- and calling
create_reference_table()for such tables might yield an error if one of those shared tables have automatically been converted to a Citus local table due to a foreign key to a shared Citus reference table.- [ ] (High priority) (Medium) Enable altering identity columns (at least the underlying datatype).
- [x] (High priority) (Medium) Allow undistributing tables that has identity column. If that's hard, at least have a smarter logic to convert Citus local tables to single-shard tables / reference tables without undistributing the table first. (https://github.com/citusdata/citus/pull/7131)
- [ ] (High priority) (Hard) Ultimately, get rid of the limitations regarding the usage of identity columns in Citus tables.