qpmodel icon indicating copy to clipboard operation
qpmodel copied to clipboard

accelerate one time big data set query

Open zhouqingqing opened this issue 3 years ago • 0 comments

The reason we need foreign table scan is to accelerate debug queries against big data set. Currently we have to load the whole data set into memory, collect stats, then run the query. This is slow when the data set is big (but good for batch of queries run).

To solve this problem, we need the following:

  1. DDL to persists/read back stats: basic function is already there. See statis.cs.
  2. support feign table with syntax like this:
CREATE FOREIGN TABLE A(i int)
        OPTIONS ( filename 'data/data1.csv', format 'csv' );

Note that we have PhysicScanFile can read from csv.

With above, we can:

  1. One time to load data set, collect stats and persists stats.
  2. Whenever you query use foreign table, you can load stats and read csv directly.

zhouqingqing avatar Dec 19 '20 17:12 zhouqingqing