docs icon indicating copy to clipboard operation
docs copied to clipboard

sql: check constraint should be added for better performance on uniqueness check in `PARTITION ALL BY` tables

Open chengxiong-ruan opened this issue 3 years ago • 0 comments

Chengxiong Ruan (chengxiong-ruan) commented:

Looks like we lack of documentation on PARTITION ALL BY in general: https://www.cockroachlabs.com/docs/stable/create-table.html https://www.cockroachlabs.com/docs/stable/partitioning https://www.cockroachlabs.com/docs/stable/alter-table https://www.cockroachlabs.com/docs/stable/partition-by

One thing important to notice users is that how check constraint on partitioning fields affect INSERT performance on implicit partitioning with PARTITION ALL BY. For example, the following table is partitioned ALL on part, but there's not check constraint about what possible values part may have. A simple insert would cause a full table scan on this table:

CREATE TABLE t (
  id INT PRIMARY KEY,
  part STRING
) PARTITION ALL BY LIST (part) (
  PARTITION us_west VALUES IN (('seattle')),
  PARTITION us_east VALUES IN (('new york'))
);

[email protected]:26257/defaultdb> explain insert into t values (1, 'seattle');
                    info
--------------------------------------------
  distribution: local
  vectorized: true

  • root
  │
  ├── • insert
  │   │ into: t(id, part)
  │   │
  │   └── • buffer
  │       │ label: buffer 1
  │       │
  │       └── • values
  │             size: 2 columns, 1 row
  │
  └── • constraint-check
      │
      └── • error if rows
          │
          └── • hash join (right semi)
              │ equality: (id) = (column1)
              │ right cols are key
              │ pred: column2 != part
              │
              ├── • scan
              │     missing stats
              │     table: t@primary
              │     spans: FULL SCAN
              │
              └── • scan buffer
                    estimated row count: 1
                    label: buffer 1

However, if a check constraint is added to tell the set of allowed values of part. The insert would be more efficient with a lookup:

CREATE TABLE t (
  id INT PRIMARY KEY,
  part STRING CHECK (part IN ('seattle', 'new york'))
) PARTITION ALL BY LIST (part) (
  PARTITION us_west VALUES IN (('seattle')),
  PARTITION us_east VALUES IN (('new york'))
);

[email protected]:26257/defaultdb> explain insert into t values (1, 'seattle');
                                          info
----------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • root
  │
  ├── • insert
  │   │ into: t(id, part)
  │   │
  │   └── • buffer
  │       │ label: buffer 1
  │       │
  │       └── • values
  │             size: 3 columns, 1 row
  │
  └── • constraint-check
      │
      └── • error if rows
          │
          └── • lookup join (semi)
              │ table: t@primary
              │ lookup condition: (column1 = id) AND (part IN ('new york', 'seattle'))
              │ pred: column2 != part
              │
              └── • scan buffer
                    label: buffer 1

Jira Issue: DOC-2618

chengxiong-ruan avatar Feb 11 '22 22:02 chengxiong-ruan