cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

opt: collect stats for virtual computed columns

Open mgartner opened this issue 3 years ago • 3 comments

Once #68186 is addressed, we will no longer collect stats for virtual computed columns. This can prevent the optimizer from accurately calculating the cost of scanning an index on a virtual column, and, transitively, the cost of scanning an expression index.

Here's an example:

CREATE TABLE t (k INT PRIMARY KEY, a INT, INDEX ((a%3)));

INSERT INTO t SELECT i, 1 FROM generate_series (1, 10) as s(i);
INSERT INTO t SELECT i, 2 FROM generate_series (11, 100) as s(i);
INSERT INTO t SELECT i, 3 FROM generate_series (101, 1000) as s(i);

ANALYZE t;

EXPLAIN SELECT k FROM t WHERE a%3 = 0;
                                      info
---------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 10 (1.0% of the table; stats collected 12 seconds ago)
    table: t@t_expr_idx
    spans: [/0 - /0]
(7 rows)

EXPLAIN SELECT k FROM t WHERE a%3 = 1;
                                      info
---------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 10 (1.0% of the table; stats collected 15 seconds ago)
    table: t@t_expr_idx
    spans: [/1 - /1]
(7 rows)

EXPLAIN SELECT k FROM t WHERE a%3 = 2;
                                      info
---------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 10 (1.0% of the table; stats collected 17 seconds ago)
    table: t@t_expr_idx
    spans: [/2 - /2]
(7 rows)

Notice that the estimated row count for each scan is the same, despite the fact that there are 10 rows where a%3 = 1, 90 rows where a%3 = 2, and 900 rows where a%3 = 0.

In order to better estimated row counts in queries like this, we will have to:

  1. Collect statistics for virtual computed columns. Currently column statistics are sampled from the primary index. Virtual columns are not included in primary indexes, so there will be somewhere to sample virtual columns from indexes on them.
  2. Propagate virtual column statistics in statistics builder. A canonical scan on a primary index does not produce a virtual column. Instead, virtual column values are calculated at query execution time by wrapping a scan with a projection that produces the virtual column. Scans only contain statistics for columns they produce, so the challenge will be either producing virtual column statistics from a scan or from the projection that produces them.

Jira issue: CRDB-13888

Epic: CRDB-8949

mgartner avatar Jul 29 '21 21:07 mgartner

We have some customer requests to get this done in 23.2

rytaft avatar Feb 17 '23 16:02 rytaft

Also see https://github.com/cockroachdb/cockroach/issues/85176 and https://github.com/cockroachdb/cockroach/issues/78181 which are similar.

michae2 avatar Mar 07 '23 19:03 michae2

Another example of this coming up with an expression index:

CREATE TABLE t (i INT PRIMARY KEY, j JSONB);
INSERT INTO t SELECT i, json_build_object('a', i % 10) FROM generate_series(0, 9999) AS s(i);

-- Say we wanted to find rows with j->'a' = '3' but we did not want to index all of j.
-- In 23.2 we could use a forward index on j->'a' instead of an inverted index:
CREATE INDEX ON t ((j->'a'));
ANALYZE t;

-- This does use the index, but the lack of stats means we estimate 10 rows instead of 1000:
EXPLAIN SELECT * FROM t WHERE j->'a' = '3';

-- A workaround until stats are collected on virtual computed columns is to index a stored computed column:
DROP INDEX t_expr_idx;
ALTER TABLE t ADD COLUMN ja JSONB AS (j->'a') STORED;
CREATE INDEX ON t (ja);
ANALYZE t;

-- Now this uses the index and has good stats:
EXPLAIN SELECT * FROM t WHERE j->'a' = '3';

michae2 avatar Oct 06 '23 16:10 michae2

Hi @michae2, please add branch-* labels to identify which branch(es) this GA-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Mar 06 '24 18:03 blathers-crl[bot]