yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[YSQL] ysql_dump omits SPLIT INTO/SPLIT AT clause for an index after ALTER TABLE ADD UNIQUE constraint using the index

Open yifanguan opened this issue 3 years ago • 3 comments

Jira Link: DB-982

Description

For backup, ysql_dump omits the number of tablets specified by the SPLIT INTO/SPLIT AT clause for an index creation after ALTER TABLE ADD UNIQUE constraint using the index. For restore, it simply uses the default index creation: HASH partition with number: ysql_num_shards_per_tserver * num_of_tserver tablets for both range and hash partitioned index and doesn't care how many tablets the index has.

The main problem is after specifying a constraint using an index, ysql_dump treats the index as a constraint instead of creating the index first and then adding the constraint. A straightforward solution is: for ysql_dump, don't write ALTER TABLE ADD CONSTRAINT directly, separate all such statements into two steps (1) create an index for the constraint if needed (2) then ALTER TABLE ADD CONSTRAINT using the created index. Step (1) can allow our SPLIT INTO/SPLIT AT clause syntax for index creation.

Examples: (1) For SPLIT AT range-partitioned index,

CREATE TABLE mytbl_range (k INT PRIMARY KEY, v TEXT);
CREATE UNIQUE INDEX myidx_range ON mytbl_range (v ASC) SPLIT AT VALUES (('foo'), ('qux'));

ALTER TABLE mytbl_range ADD UNIQUE USING INDEX myidx_range;

INSERT INTO mytbl_range (k, v) VALUES (1, 'bar'), (2, 'jar'), (3, 'tar');

Open localhost:7000 or yb-admin -master_addresses 127.0.0.1:7100 list_tablets ysql.yugabyte mytbl_range to verify myidx_range is range-partitioned and has 3 tablets. However, the generated output from ysql_dump backup is (only paste the most important part here):

CREATE TABLE public.mytbl_range (
    k integer NOT NULL,
    v text,
    CONSTRAINT mytbl_range_pkey PRIMARY KEY((k) HASH)
)
SPLIT INTO 2 TABLETS;


ALTER TABLE public.mytbl_range OWNER TO yugabyte;

--
-- Name: mytbl_range myidx_range; Type: CONSTRAINT; Schema: public; Owner: yugabyte
--

ALTER TABLE ONLY public.mytbl_range
    ADD CONSTRAINT myidx_range UNIQUE (v);

myidx_range is treated as a constraint, and it creates an index implicitly. If we use the dump output to restore the database, we can find that myidx_range is a hash-partitioned index with 2 tablets (the default number).

(2) For SPLIT INTO hash-partitioned index, the problem exists as well

CREATE TABLE mytbl_hash (k INT PRIMARY KEY, v TEXT);
CREATE UNIQUE INDEX myidx_hash ON mytbl_hash (v) SPLIT INTO 10 TABLETS;

ALTER TABLE mytbl_hash ADD UNIQUE USING INDEX myidx_hash;

INSERT INTO mytbl_hash (k, v) VALUES (1, 'bar'), (2, 'jar'), (3, 'tar');

Dump output:

CREATE TABLE public.mytbl_hash (
    k integer NOT NULL,
    v text,
    CONSTRAINT mytbl_hash_pkey PRIMARY KEY((k) HASH)
)
SPLIT INTO 2 TABLETS;

ALTER TABLE public.mytbl_hash OWNER TO yugabyte;

--
-- Name: mytbl_hash myidx_hash; Type: CONSTRAINT; Schema: public; Owner: yugabyte
--

ALTER TABLE ONLY public.mytbl_hash
    ADD CONSTRAINT myidx_hash UNIQUE (v);

We should expect the restored myidx_hash to have 10 tablets, but it has 2 tablets instead.

yifanguan avatar Feb 24 '22 15:02 yifanguan

This problem with SPLIT INTO can be solved by generic YSQL backup/restore handling for any partitioning mismatch Commits b14485a5fdd8220a23e4f685d84b44e501b3bb15 and 96beb9e77e5d6e4032106bdbc52fcd15df5f6c05.

Although this generic YSQL backup/restore can also handle partition mismatch for range-partitioned tables/indexes. In this case, we still have a problem with SPLIT AT. That is, the restored index in the example above is hash-partitioned based on the syntax (the default scheme). However, I assume the metadata stored says the index is range-partitioned.

yifanguan avatar Feb 24 '22 23:02 yifanguan

The test YBBackupTest.TestYSQLRangeSplitConstraint is identical to SPLIT AT scenario shown above.

  1. It failed with the error:
Error running import_snapshot: Internal error (yb/master/catalog_manager_ent.cc:1625): Unable to import snapshot meta file /tmp/yb_backup_bgealjkzsnylbhpp/SnapshotInfoPB: Invalid created PGSQL_TABLE_TYPE table

This happened before the generic backup/restore RepartitionTable.

  1. The created index is range partitioned.
[m-1] name: "myidx"
[m-1] schema {
...
[m-1] num_tablets: 3
[m-1] partition_schema {
[m-1]   range_schema {
[m-1]     columns {
[m-1]       name: "v"
[m-1]     }
[m-1]     columns {
[m-1]       name: "ybuniqueidxkeysuffix"
[m-1]     }
[m-1]     splits {
[m-1]       column_bounds: "Sfoo\000\000!"
[m-1]     }
[m-1]     splits {
[m-1]       column_bounds: "Squx\000\000!"
[m-1]     }
[m-1]   }
[m-1] }
[m-1] table_type: PGSQL_TABLE_TYPE
...
  1. The re-created index is hash partitioned.
[m-1] name: "myidx"
[m-1] schema {
...
[m-1] num_tablets: 3
[m-1] partition_schema {
[m-1]   hash_schema: PGSQL_HASH_SCHEMA
[m-1] }
[m-1] table_type: PGSQL_TABLE_TYPE
...

yifanguan avatar Feb 25 '22 00:02 yifanguan

Just a note, this was likely introduced on the ALTER .. ADD UNIQUE side when adding HASH column option (in addition to ASC and DESC). In vanilla PG, only the default sorting is allowed for ALTER .. ADD UNIQUE. See example below (works the same in both YB and PG):

postgres=# CREATE TABLE pg_t1 (k INT PRIMARY KEY, v TEXT);
CREATE TABLE
postgres=#
postgres=# CREATE UNIQUE INDEX pg_idx1 ON pg_t1 (v ASC);
CREATE INDEX
postgres=# ALTER TABLE pg_t1 ADD UNIQUE USING INDEX pg_idx1;
ALTER TABLE
postgres=# CREATE UNIQUE INDEX pg_idx2 ON pg_t1 (v DESC);
CREATE INDEX
postgres=# ALTER TABLE pg_t1 ADD UNIQUE USING INDEX pg_idx2;
ERROR:  index "pg_idx2" does not have default sorting behavior
LINE 1: ALTER TABLE pg_t1 ADD UNIQUE USING INDEX pg_idx2;
                              ^
DETAIL:  Cannot create a primary key or unique constraint using such an index.

This means pg_dump does need to handle non-default sorting cases. But in Yugabyte we added HASH as a default for the first column and still allow the ALTER above for ASC for backwards compatibility.

See also the relevant code: https://github.com/yugabyte/yugabyte-db/blob/master/src/postgres/src/backend/parser/parse_utilcmd.c#L2379

m-iancu avatar Feb 28 '22 20:02 m-iancu

This issue has been fixed as part of 13603. Examples of SPLIT INTO and SPLIT AT clauses are included in yb_ysql_dump.data.sql.

emhna avatar Sep 26 '22 16:09 emhna